Cargando…

Gene fusions and gene duplications: relevance to genomic annotation and functional analysis

BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct f...

Descripción completa

Detalles Bibliográficos
Autores principales: Serres, Margrethe H, Riley, Monica
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555942/
https://www.ncbi.nlm.nih.gov/pubmed/15757509
http://dx.doi.org/10.1186/1471-2164-6-33
_version_ 1782122570495033344
author Serres, Margrethe H
Riley, Monica
author_facet Serres, Margrethe H
Riley, Monica
author_sort Serres, Margrethe H
collection PubMed
description BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. RESULTS: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. CONCLUSION: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms.
format Text
id pubmed-555942
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5559422005-04-03 Gene fusions and gene duplications: relevance to genomic annotation and functional analysis Serres, Margrethe H Riley, Monica BMC Genomics Research Article BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. RESULTS: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. CONCLUSION: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms. BioMed Central 2005-03-09 /pmc/articles/PMC555942/ /pubmed/15757509 http://dx.doi.org/10.1186/1471-2164-6-33 Text en Copyright © 2005 Serres and Riley; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Serres, Margrethe H
Riley, Monica
Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
title Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
title_full Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
title_fullStr Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
title_full_unstemmed Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
title_short Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
title_sort gene fusions and gene duplications: relevance to genomic annotation and functional analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555942/
https://www.ncbi.nlm.nih.gov/pubmed/15757509
http://dx.doi.org/10.1186/1471-2164-6-33
work_keys_str_mv AT serresmargretheh genefusionsandgeneduplicationsrelevancetogenomicannotationandfunctionalanalysis
AT rileymonica genefusionsandgeneduplicationsrelevancetogenomicannotationandfunctionalanalysis