Cargando…

Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?

BACKGROUND: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure again...

Descripción completa

Detalles Bibliográficos
Autores principales: Pallejà, Albert, Harrington, Eoghan D, Bork, Peer
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2478687/
https://www.ncbi.nlm.nih.gov/pubmed/18627618
http://dx.doi.org/10.1186/1471-2164-9-335
_version_ 1782157618528124928
author Pallejà, Albert
Harrington, Eoghan D
Bork, Peer
author_facet Pallejà, Albert
Harrington, Eoghan D
Bork, Peer
author_sort Pallejà, Albert
collection PubMed
description BACKGROUND: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. RESULTS: We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). CONCLUSION: Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.
format Text
id pubmed-2478687
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24786872008-07-22 Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? Pallejà, Albert Harrington, Eoghan D Bork, Peer BMC Genomics Research Article BACKGROUND: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. RESULTS: We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). CONCLUSION: Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation. BioMed Central 2008-07-15 /pmc/articles/PMC2478687/ /pubmed/18627618 http://dx.doi.org/10.1186/1471-2164-9-335 Text en Copyright © 2008 Pallejà et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pallejà, Albert
Harrington, Eoghan D
Bork, Peer
Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
title Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
title_full Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
title_fullStr Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
title_full_unstemmed Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
title_short Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
title_sort large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2478687/
https://www.ncbi.nlm.nih.gov/pubmed/18627618
http://dx.doi.org/10.1186/1471-2164-9-335
work_keys_str_mv AT pallejaalbert largegeneoverlapsinprokaryoticgenomesresultoffunctionalconstraintsormispredictions
AT harringtoneoghand largegeneoverlapsinprokaryoticgenomesresultoffunctionalconstraintsormispredictions
AT borkpeer largegeneoverlapsinprokaryoticgenomesresultoffunctionalconstraintsormispredictions