Cargando…

Improving gene annotation of complete viral genomes

Gene annotation in viruses often relies upon similarity search methods. These methods possess high specificity but some genes may be missed, either those unique to a particular genome or those highly divergent from known homologs. To identify potentially missing viral genes we have analyzed all comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Mills, Ryan, Rozanov, Michael, Lomsadze, Alexandre, Tatusova, Tatiana, Borodovsky, Mark
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC290248/
https://www.ncbi.nlm.nih.gov/pubmed/14627837
http://dx.doi.org/10.1093/nar/gkg878
_version_ 1782121074181275648
author Mills, Ryan
Rozanov, Michael
Lomsadze, Alexandre
Tatusova, Tatiana
Borodovsky, Mark
author_facet Mills, Ryan
Rozanov, Michael
Lomsadze, Alexandre
Tatusova, Tatiana
Borodovsky, Mark
author_sort Mills, Ryan
collection PubMed
description Gene annotation in viruses often relies upon similarity search methods. These methods possess high specificity but some genes may be missed, either those unique to a particular genome or those highly divergent from known homologs. To identify potentially missing viral genes we have analyzed all complete viral genomes currently available in GenBank with a specialized and augmented version of the gene finding program GeneMarkS. In particular, by implementing genome-specific self-training protocols we have better adjusted the GeneMarkS statistical models to sequences of viral genomes. Hundreds of new genes were identified, some in well studied viral genomes. For example, a new gene predicted in the genome of the Epstein–Barr virus was shown to encode a protein similar to α-herpesvirus minor tegument protein UL14 with heat shock functions. Convincing evidence of this similarity was obtained after only 12 PSI-BLAST iterations. In another example, several iterations of PSI-BLAST were required to demonstrate that a gene predicted in the genome of Alcelaphine herpesvirus 1 encodes a BALF1-like protein which is thought to be involved in apoptosis regulation and, potentially, carcinogenesis. New predictions were used to refine annotations of viral genomes in the RefSeq collection curated by the National Center for Biotechnology Information. Importantly, even in those cases where no sequence similarities were detected, GeneMarkS significantly reduced the number of primary targets for experimental characterization by identifying the most probable candidate genes. The new genome annotations were stored in VIOLIN, an interactive database which provides access to similarity search tools for up-to-date analysis of predicted viral proteins.
format Text
id pubmed-290248
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-2902482003-12-23 Improving gene annotation of complete viral genomes Mills, Ryan Rozanov, Michael Lomsadze, Alexandre Tatusova, Tatiana Borodovsky, Mark Nucleic Acids Res Articles Gene annotation in viruses often relies upon similarity search methods. These methods possess high specificity but some genes may be missed, either those unique to a particular genome or those highly divergent from known homologs. To identify potentially missing viral genes we have analyzed all complete viral genomes currently available in GenBank with a specialized and augmented version of the gene finding program GeneMarkS. In particular, by implementing genome-specific self-training protocols we have better adjusted the GeneMarkS statistical models to sequences of viral genomes. Hundreds of new genes were identified, some in well studied viral genomes. For example, a new gene predicted in the genome of the Epstein–Barr virus was shown to encode a protein similar to α-herpesvirus minor tegument protein UL14 with heat shock functions. Convincing evidence of this similarity was obtained after only 12 PSI-BLAST iterations. In another example, several iterations of PSI-BLAST were required to demonstrate that a gene predicted in the genome of Alcelaphine herpesvirus 1 encodes a BALF1-like protein which is thought to be involved in apoptosis regulation and, potentially, carcinogenesis. New predictions were used to refine annotations of viral genomes in the RefSeq collection curated by the National Center for Biotechnology Information. Importantly, even in those cases where no sequence similarities were detected, GeneMarkS significantly reduced the number of primary targets for experimental characterization by identifying the most probable candidate genes. The new genome annotations were stored in VIOLIN, an interactive database which provides access to similarity search tools for up-to-date analysis of predicted viral proteins. Oxford University Press 2003-12-01 /pmc/articles/PMC290248/ /pubmed/14627837 http://dx.doi.org/10.1093/nar/gkg878 Text en Copyright © 2003 Oxford University Press
spellingShingle Articles
Mills, Ryan
Rozanov, Michael
Lomsadze, Alexandre
Tatusova, Tatiana
Borodovsky, Mark
Improving gene annotation of complete viral genomes
title Improving gene annotation of complete viral genomes
title_full Improving gene annotation of complete viral genomes
title_fullStr Improving gene annotation of complete viral genomes
title_full_unstemmed Improving gene annotation of complete viral genomes
title_short Improving gene annotation of complete viral genomes
title_sort improving gene annotation of complete viral genomes
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC290248/
https://www.ncbi.nlm.nih.gov/pubmed/14627837
http://dx.doi.org/10.1093/nar/gkg878
work_keys_str_mv AT millsryan improvinggeneannotationofcompleteviralgenomes
AT rozanovmichael improvinggeneannotationofcompleteviralgenomes
AT lomsadzealexandre improvinggeneannotationofcompleteviralgenomes
AT tatusovatatiana improvinggeneannotationofcompleteviralgenomes
AT borodovskymark improvinggeneannotationofcompleteviralgenomes