Cargando…

PHANOTATE: a novel approach to gene identification in phage genomes

MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their ge...

Descripción completa

Detalles Bibliográficos
Autores principales: McNair, Katelyn, Zhou, Carol, Dinsdale, Elizabeth A, Souza, Brian, Edwards, Robert A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853651/
https://www.ncbi.nlm.nih.gov/pubmed/31329826
http://dx.doi.org/10.1093/bioinformatics/btz265
_version_ 1783470075620098048
author McNair, Katelyn
Zhou, Carol
Dinsdale, Elizabeth A
Souza, Brian
Edwards, Robert A
author_facet McNair, Katelyn
Zhou, Carol
Dinsdale, Elizabeth A
Souza, Brian
Edwards, Robert A
author_sort McNair, Katelyn
collection PubMed
description MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path. RESULTS: We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes. AVAILABILITY AND IMPLEMENTATION: https://github.com/deprekate/PHANOTATE SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6853651
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68536512019-11-19 PHANOTATE: a novel approach to gene identification in phage genomes McNair, Katelyn Zhou, Carol Dinsdale, Elizabeth A Souza, Brian Edwards, Robert A Bioinformatics Original Papers MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path. RESULTS: We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes. AVAILABILITY AND IMPLEMENTATION: https://github.com/deprekate/PHANOTATE SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-11-15 2019-04-25 /pmc/articles/PMC6853651/ /pubmed/31329826 http://dx.doi.org/10.1093/bioinformatics/btz265 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
McNair, Katelyn
Zhou, Carol
Dinsdale, Elizabeth A
Souza, Brian
Edwards, Robert A
PHANOTATE: a novel approach to gene identification in phage genomes
title PHANOTATE: a novel approach to gene identification in phage genomes
title_full PHANOTATE: a novel approach to gene identification in phage genomes
title_fullStr PHANOTATE: a novel approach to gene identification in phage genomes
title_full_unstemmed PHANOTATE: a novel approach to gene identification in phage genomes
title_short PHANOTATE: a novel approach to gene identification in phage genomes
title_sort phanotate: a novel approach to gene identification in phage genomes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853651/
https://www.ncbi.nlm.nih.gov/pubmed/31329826
http://dx.doi.org/10.1093/bioinformatics/btz265
work_keys_str_mv AT mcnairkatelyn phanotateanovelapproachtogeneidentificationinphagegenomes
AT zhoucarol phanotateanovelapproachtogeneidentificationinphagegenomes
AT dinsdaleelizabetha phanotateanovelapproachtogeneidentificationinphagegenomes
AT souzabrian phanotateanovelapproachtogeneidentificationinphagegenomes
AT edwardsroberta phanotateanovelapproachtogeneidentificationinphagegenomes