Cargando…
PHANOTATE: a novel approach to gene identification in phage genomes
MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their ge...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853651/ https://www.ncbi.nlm.nih.gov/pubmed/31329826 http://dx.doi.org/10.1093/bioinformatics/btz265 |
_version_ | 1783470075620098048 |
---|---|
author | McNair, Katelyn Zhou, Carol Dinsdale, Elizabeth A Souza, Brian Edwards, Robert A |
author_facet | McNair, Katelyn Zhou, Carol Dinsdale, Elizabeth A Souza, Brian Edwards, Robert A |
author_sort | McNair, Katelyn |
collection | PubMed |
description | MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path. RESULTS: We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes. AVAILABILITY AND IMPLEMENTATION: https://github.com/deprekate/PHANOTATE SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6853651 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-68536512019-11-19 PHANOTATE: a novel approach to gene identification in phage genomes McNair, Katelyn Zhou, Carol Dinsdale, Elizabeth A Souza, Brian Edwards, Robert A Bioinformatics Original Papers MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path. RESULTS: We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes. AVAILABILITY AND IMPLEMENTATION: https://github.com/deprekate/PHANOTATE SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-11-15 2019-04-25 /pmc/articles/PMC6853651/ /pubmed/31329826 http://dx.doi.org/10.1093/bioinformatics/btz265 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers McNair, Katelyn Zhou, Carol Dinsdale, Elizabeth A Souza, Brian Edwards, Robert A PHANOTATE: a novel approach to gene identification in phage genomes |
title | PHANOTATE: a novel approach to gene identification in phage genomes |
title_full | PHANOTATE: a novel approach to gene identification in phage genomes |
title_fullStr | PHANOTATE: a novel approach to gene identification in phage genomes |
title_full_unstemmed | PHANOTATE: a novel approach to gene identification in phage genomes |
title_short | PHANOTATE: a novel approach to gene identification in phage genomes |
title_sort | phanotate: a novel approach to gene identification in phage genomes |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853651/ https://www.ncbi.nlm.nih.gov/pubmed/31329826 http://dx.doi.org/10.1093/bioinformatics/btz265 |
work_keys_str_mv | AT mcnairkatelyn phanotateanovelapproachtogeneidentificationinphagegenomes AT zhoucarol phanotateanovelapproachtogeneidentificationinphagegenomes AT dinsdaleelizabetha phanotateanovelapproachtogeneidentificationinphagegenomes AT souzabrian phanotateanovelapproachtogeneidentificationinphagegenomes AT edwardsroberta phanotateanovelapproachtogeneidentificationinphagegenomes |