Cargando…

ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes

BACKGROUND: It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Feng-Biao, Zhang, Chun-Ting
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1352377/
https://www.ncbi.nlm.nih.gov/pubmed/16401352
http://dx.doi.org/10.1186/1471-2105-7-9
_version_ 1782126671754690560
author Guo, Feng-Biao
Zhang, Chun-Ting
author_facet Guo, Feng-Biao
Zhang, Chun-Ting
author_sort Guo, Feng-Biao
collection PubMed
description BACKGROUND: It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to improve the quality of annotations, particularly for newly sequenced genomes. RESULTS: The new system ZCURVE_V has been run for 979 viral and 212 phage genomes, respectively, and satisfactory results are obtained. To have a fair comparison with the currently available software of similar function, GeneMark, a total of 30 viral genomes that have not been annotated by GeneMark are selected to be tested. Consequently, the average specificity of both systems is well matched, however the average sensitivity of ZCURVE_V for smaller viral genomes (< 100 kb), which constitute the main parts of viral genomes sequenced so far, is higher than that of GeneMark. Additionally, for the genome of Amsacta moorei entomopoxvirus, probably with the lowest genomic GC content among the sequenced organisms, the accuracy of ZCURVE_V is much better than that of GeneMark, because the later predicts hundreds of false-positive genes. ZCURVE_V is also used to analyze well-studied genomes, such as HIV-1, HBV and SARS-CoV. Accordingly, the performance of ZCURVE_V is generally better than that of GeneMark. Finally, ZCURVE_V may be downloaded and run locally, particularly facilitating its utilization, whereas GeneMark is not downloadable. Based on the above comparison, it is suggested that ZCURVE_V may serve as a preferred gene-finding tool for viral and phage genomes newly sequenced. However, it is also shown that the joint application of both systems, ZCURVE_V and GeneMark, leads to better gene-finding results. The system ZCURVE_V is freely available at: . CONCLUSION: ZCURVE_V may serve as a preferred gene-finding tool used for viral and phage genomes, especially for anonymous viral and phage genomes newly sequenced.
format Text
id pubmed-1352377
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13523772006-01-30 ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes Guo, Feng-Biao Zhang, Chun-Ting BMC Bioinformatics Software BACKGROUND: It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to improve the quality of annotations, particularly for newly sequenced genomes. RESULTS: The new system ZCURVE_V has been run for 979 viral and 212 phage genomes, respectively, and satisfactory results are obtained. To have a fair comparison with the currently available software of similar function, GeneMark, a total of 30 viral genomes that have not been annotated by GeneMark are selected to be tested. Consequently, the average specificity of both systems is well matched, however the average sensitivity of ZCURVE_V for smaller viral genomes (< 100 kb), which constitute the main parts of viral genomes sequenced so far, is higher than that of GeneMark. Additionally, for the genome of Amsacta moorei entomopoxvirus, probably with the lowest genomic GC content among the sequenced organisms, the accuracy of ZCURVE_V is much better than that of GeneMark, because the later predicts hundreds of false-positive genes. ZCURVE_V is also used to analyze well-studied genomes, such as HIV-1, HBV and SARS-CoV. Accordingly, the performance of ZCURVE_V is generally better than that of GeneMark. Finally, ZCURVE_V may be downloaded and run locally, particularly facilitating its utilization, whereas GeneMark is not downloadable. Based on the above comparison, it is suggested that ZCURVE_V may serve as a preferred gene-finding tool for viral and phage genomes newly sequenced. However, it is also shown that the joint application of both systems, ZCURVE_V and GeneMark, leads to better gene-finding results. The system ZCURVE_V is freely available at: . CONCLUSION: ZCURVE_V may serve as a preferred gene-finding tool used for viral and phage genomes, especially for anonymous viral and phage genomes newly sequenced. BioMed Central 2006-01-10 /pmc/articles/PMC1352377/ /pubmed/16401352 http://dx.doi.org/10.1186/1471-2105-7-9 Text en Copyright © 2006 Guo and Zhang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Guo, Feng-Biao
Zhang, Chun-Ting
ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
title ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
title_full ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
title_fullStr ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
title_full_unstemmed ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
title_short ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
title_sort zcurve_v: a new self-training system for recognizing protein-coding genes in viral and phage genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1352377/
https://www.ncbi.nlm.nih.gov/pubmed/16401352
http://dx.doi.org/10.1186/1471-2105-7-9
work_keys_str_mv AT guofengbiao zcurvevanewselftrainingsystemforrecognizingproteincodinggenesinviralandphagegenomes
AT zhangchunting zcurvevanewselftrainingsystemforrecognizingproteincodinggenesinviralandphagegenomes