Cargando…
Quantitative sequence-function relationships in proteins based on gene ontology
BACKGROUND: The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quanti...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1976327/ https://www.ncbi.nlm.nih.gov/pubmed/17686158 http://dx.doi.org/10.1186/1471-2105-8-294 |
_version_ | 1782135071672631296 |
---|---|
author | Sangar, Vineet Blankenberg, Daniel J Altman, Naomi Lesk, Arthur M |
author_facet | Sangar, Vineet Blankenberg, Daniel J Altman, Naomi Lesk, Arthur M |
author_sort | Sangar, Vineet |
collection | PubMed |
description | BACKGROUND: The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. RESULTS: We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. CONCLUSION: Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative. |
format | Text |
id | pubmed-1976327 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-19763272007-09-13 Quantitative sequence-function relationships in proteins based on gene ontology Sangar, Vineet Blankenberg, Daniel J Altman, Naomi Lesk, Arthur M BMC Bioinformatics Research Article BACKGROUND: The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. RESULTS: We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. CONCLUSION: Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative. BioMed Central 2007-08-08 /pmc/articles/PMC1976327/ /pubmed/17686158 http://dx.doi.org/10.1186/1471-2105-8-294 Text en Copyright © 2007 Sangar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Sangar, Vineet Blankenberg, Daniel J Altman, Naomi Lesk, Arthur M Quantitative sequence-function relationships in proteins based on gene ontology |
title | Quantitative sequence-function relationships in proteins based on gene ontology |
title_full | Quantitative sequence-function relationships in proteins based on gene ontology |
title_fullStr | Quantitative sequence-function relationships in proteins based on gene ontology |
title_full_unstemmed | Quantitative sequence-function relationships in proteins based on gene ontology |
title_short | Quantitative sequence-function relationships in proteins based on gene ontology |
title_sort | quantitative sequence-function relationships in proteins based on gene ontology |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1976327/ https://www.ncbi.nlm.nih.gov/pubmed/17686158 http://dx.doi.org/10.1186/1471-2105-8-294 |
work_keys_str_mv | AT sangarvineet quantitativesequencefunctionrelationshipsinproteinsbasedongeneontology AT blankenbergdanielj quantitativesequencefunctionrelationshipsinproteinsbasedongeneontology AT altmannaomi quantitativesequencefunctionrelationshipsinproteinsbasedongeneontology AT leskarthurm quantitativesequencefunctionrelationshipsinproteinsbasedongeneontology |