Cargando…
Detecting overlapping coding sequences in virus genomes
BACKGROUND: Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related se...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1395342/ https://www.ncbi.nlm.nih.gov/pubmed/16483358 http://dx.doi.org/10.1186/1471-2105-7-75 |
_version_ | 1782126951468630016 |
---|---|
author | Firth, Andrew E Brown, Chris M |
author_facet | Firth, Andrew E Brown, Chris M |
author_sort | Firth, Andrew E |
collection | PubMed |
description | BACKGROUND: Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs). RESULTS: In a previous paper we introduced a new statistic – MLOGD (Maximum Likelihood Overlapping Gene Detector) – for detecting and analysing overlapping CDSs. Here we present (a) an improved MLOGD statistic, (b) a greatly extended suite of software using MLOGD, (c) a database of results for 640 virus sequence alignments, and (d) a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. CONCLUSION: MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs – in particular overlapping or short CDSs – and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at . |
format | Text |
id | pubmed-1395342 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-13953422006-04-14 Detecting overlapping coding sequences in virus genomes Firth, Andrew E Brown, Chris M BMC Bioinformatics Software BACKGROUND: Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs). RESULTS: In a previous paper we introduced a new statistic – MLOGD (Maximum Likelihood Overlapping Gene Detector) – for detecting and analysing overlapping CDSs. Here we present (a) an improved MLOGD statistic, (b) a greatly extended suite of software using MLOGD, (c) a database of results for 640 virus sequence alignments, and (d) a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. CONCLUSION: MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs – in particular overlapping or short CDSs – and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at . BioMed Central 2006-02-16 /pmc/articles/PMC1395342/ /pubmed/16483358 http://dx.doi.org/10.1186/1471-2105-7-75 Text en Copyright © 2006 Firth and Brown, licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Firth, Andrew E Brown, Chris M Detecting overlapping coding sequences in virus genomes |
title | Detecting overlapping coding sequences in virus genomes |
title_full | Detecting overlapping coding sequences in virus genomes |
title_fullStr | Detecting overlapping coding sequences in virus genomes |
title_full_unstemmed | Detecting overlapping coding sequences in virus genomes |
title_short | Detecting overlapping coding sequences in virus genomes |
title_sort | detecting overlapping coding sequences in virus genomes |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1395342/ https://www.ncbi.nlm.nih.gov/pubmed/16483358 http://dx.doi.org/10.1186/1471-2105-7-75 |
work_keys_str_mv | AT firthandrewe detectingoverlappingcodingsequencesinvirusgenomes AT brownchrism detectingoverlappingcodingsequencesinvirusgenomes |