Cargando…

Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

BACKGROUND: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA...

Descripción completa

Detalles Bibliográficos
Autores principales: Sadreyev, Ruslan I, Grishin, Nick V
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC516024/
https://www.ncbi.nlm.nih.gov/pubmed/15296518
http://dx.doi.org/10.1186/1471-2105-5-106
_version_ 1782121756295692288
author Sadreyev, Ruslan I
Grishin, Nick V
author_facet Sadreyev, Ruslan I
Grishin, Nick V
author_sort Sadreyev, Ruslan I
collection PubMed
description BACKGROUND: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. RESULTS: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. CONCLUSION: The proposed computational method is of significant potential value for the analysis of protein families.
format Text
id pubmed-516024
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5160242004-09-04 Estimates of statistical significance for comparison of individual positions in multiple sequence alignments Sadreyev, Ruslan I Grishin, Nick V BMC Bioinformatics Research Article BACKGROUND: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. RESULTS: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. CONCLUSION: The proposed computational method is of significant potential value for the analysis of protein families. BioMed Central 2004-08-05 /pmc/articles/PMC516024/ /pubmed/15296518 http://dx.doi.org/10.1186/1471-2105-5-106 Text en Copyright © 2004 Sadreyev and Grishin; licensee BioMed Central Ltd.
spellingShingle Research Article
Sadreyev, Ruslan I
Grishin, Nick V
Estimates of statistical significance for comparison of individual positions in multiple sequence alignments
title Estimates of statistical significance for comparison of individual positions in multiple sequence alignments
title_full Estimates of statistical significance for comparison of individual positions in multiple sequence alignments
title_fullStr Estimates of statistical significance for comparison of individual positions in multiple sequence alignments
title_full_unstemmed Estimates of statistical significance for comparison of individual positions in multiple sequence alignments
title_short Estimates of statistical significance for comparison of individual positions in multiple sequence alignments
title_sort estimates of statistical significance for comparison of individual positions in multiple sequence alignments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC516024/
https://www.ncbi.nlm.nih.gov/pubmed/15296518
http://dx.doi.org/10.1186/1471-2105-5-106
work_keys_str_mv AT sadreyevruslani estimatesofstatisticalsignificanceforcomparisonofindividualpositionsinmultiplesequencealignments
AT grishinnickv estimatesofstatisticalsignificanceforcomparisonofindividualpositionsinmultiplesequencealignments