Cargando…

Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

BACKGROUND: Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informati...

Descripción completa

Detalles Bibliográficos
Autores principales: Powell, Bradford C, Hutchison, Clyde A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1386717/
https://www.ncbi.nlm.nih.gov/pubmed/16423288
http://dx.doi.org/10.1186/1471-2105-7-31
_version_ 1782126884357668864
author Powell, Bradford C
Hutchison, Clyde A
author_facet Powell, Bradford C
Hutchison, Clyde A
author_sort Powell, Bradford C
collection PubMed
description BACKGROUND: Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. RESULTS: "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. CONCLUSION: Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.
format Text
id pubmed-1386717
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13867172006-03-02 Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs Powell, Bradford C Hutchison, Clyde A BMC Bioinformatics Methodology Article BACKGROUND: Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. RESULTS: "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. CONCLUSION: Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. BioMed Central 2006-01-19 /pmc/articles/PMC1386717/ /pubmed/16423288 http://dx.doi.org/10.1186/1471-2105-7-31 Text en Copyright © 2006 Powell and Hutchison; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Powell, Bradford C
Hutchison, Clyde A
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
title Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
title_full Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
title_fullStr Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
title_full_unstemmed Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
title_short Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
title_sort similarity-based gene detection: using cogs to find evolutionarily-conserved orfs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1386717/
https://www.ncbi.nlm.nih.gov/pubmed/16423288
http://dx.doi.org/10.1186/1471-2105-7-31
work_keys_str_mv AT powellbradfordc similaritybasedgenedetectionusingcogstofindevolutionarilyconservedorfs
AT hutchisonclydea similaritybasedgenedetectionusingcogstofindevolutionarilyconservedorfs