Cargando…

Background frequencies for residue variability estimates: BLOSUM revisited

BACKGROUND: Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Mihalek, I, Reš, I, Lichtarge, O
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267808/
https://www.ncbi.nlm.nih.gov/pubmed/18162129
http://dx.doi.org/10.1186/1471-2105-8-488
_version_ 1782151663070478336
author Mihalek, I
Reš, I
Lichtarge, O
author_facet Mihalek, I
Reš, I
Lichtarge, O
author_sort Mihalek, I
collection PubMed
description BACKGROUND: Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power. RESULTS: In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure. CONCLUSION: We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.
format Text
id pubmed-2267808
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22678082008-03-17 Background frequencies for residue variability estimates: BLOSUM revisited Mihalek, I Reš, I Lichtarge, O BMC Bioinformatics Research Article BACKGROUND: Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power. RESULTS: In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure. CONCLUSION: We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope. BioMed Central 2007-12-27 /pmc/articles/PMC2267808/ /pubmed/18162129 http://dx.doi.org/10.1186/1471-2105-8-488 Text en Copyright © 2007 Mihalek et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mihalek, I
Reš, I
Lichtarge, O
Background frequencies for residue variability estimates: BLOSUM revisited
title Background frequencies for residue variability estimates: BLOSUM revisited
title_full Background frequencies for residue variability estimates: BLOSUM revisited
title_fullStr Background frequencies for residue variability estimates: BLOSUM revisited
title_full_unstemmed Background frequencies for residue variability estimates: BLOSUM revisited
title_short Background frequencies for residue variability estimates: BLOSUM revisited
title_sort background frequencies for residue variability estimates: blosum revisited
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267808/
https://www.ncbi.nlm.nih.gov/pubmed/18162129
http://dx.doi.org/10.1186/1471-2105-8-488
work_keys_str_mv AT mihaleki backgroundfrequenciesforresiduevariabilityestimatesblosumrevisited
AT resi backgroundfrequenciesforresiduevariabilityestimatesblosumrevisited
AT lichtargeo backgroundfrequenciesforresiduevariabilityestimatesblosumrevisited