Cargando…

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

BACKGROUND: While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sierk, Michael L, Smoot, Michael E, Bass, Ellen J, Pearson, William R
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2850363/ https://www.ncbi.nlm.nih.gov/pubmed/20307279 http://dx.doi.org/10.1186/1471-2105-11-146

_version_	1782179777870823424
author	Sierk, Michael L Smoot, Michael E Bass, Ellen J Pearson, William R
author_facet	Sierk, Michael L Smoot, Michael E Bass, Ellen J Pearson, William R
author_sort	Sierk, Michael L
collection	PubMed
description	BACKGROUND: While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. RESULTS: We compared near-optimal protein sequence alignments produced by the Zuker algorithm and a set of probabilistic alignments produced by the probA program with structural alignments produced by four different structure alignment algorithms. There is significant overlap between the solution spaces of structural alignments and both the near-optimal sequence alignments produced by commonly used scoring parameters for sequences that share significant sequence similarity (E-values < 10(-5)) and the ensemble of probA alignments. We constructed a logistic regression model incorporating three input variables derived from sets of near-optimal alignments: robustness, edge frequency, and maximum bits-per-position. A ROC analysis shows that this model more accurately classifies amino acid pairs (edges in the alignment path graph) according to the likelihood of appearance in structural alignments than the robustness score alone. We investigated various trimming protocols for removing incorrect edges from the optimal sequence alignment; the most effective protocol is to remove matches from the semi-global optimal alignment that are outside the boundaries of the local alignment, although trimming according to the model-generated probabilities achieves a similar level of improvement. The model can also be used to generate novel alignments by using the probabilities in lieu of a scoring matrix. These alignments are typically better than the optimal sequence alignment, and include novel correct structural edges. We find that the probA alignments sample a larger variety of alignments than the Zuker set, which more frequently results in alignments that are closer to the structural alignments, but that using the probA alignments as input to the regression model does not increase performance. CONCLUSIONS: The pool of suboptimal pairwise protein sequence alignments substantially overlaps structure-based alignments for pairs with statistically significant similarity, and a regression model based on information contained in this alignment pool improves the accuracy of pairwise alignments with respect to structure-based alignments.
format	Text
id	pubmed-2850363
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28503632010-04-07 Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments Sierk, Michael L Smoot, Michael E Bass, Ellen J Pearson, William R BMC Bioinformatics Methodology article BACKGROUND: While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. RESULTS: We compared near-optimal protein sequence alignments produced by the Zuker algorithm and a set of probabilistic alignments produced by the probA program with structural alignments produced by four different structure alignment algorithms. There is significant overlap between the solution spaces of structural alignments and both the near-optimal sequence alignments produced by commonly used scoring parameters for sequences that share significant sequence similarity (E-values < 10(-5)) and the ensemble of probA alignments. We constructed a logistic regression model incorporating three input variables derived from sets of near-optimal alignments: robustness, edge frequency, and maximum bits-per-position. A ROC analysis shows that this model more accurately classifies amino acid pairs (edges in the alignment path graph) according to the likelihood of appearance in structural alignments than the robustness score alone. We investigated various trimming protocols for removing incorrect edges from the optimal sequence alignment; the most effective protocol is to remove matches from the semi-global optimal alignment that are outside the boundaries of the local alignment, although trimming according to the model-generated probabilities achieves a similar level of improvement. The model can also be used to generate novel alignments by using the probabilities in lieu of a scoring matrix. These alignments are typically better than the optimal sequence alignment, and include novel correct structural edges. We find that the probA alignments sample a larger variety of alignments than the Zuker set, which more frequently results in alignments that are closer to the structural alignments, but that using the probA alignments as input to the regression model does not increase performance. CONCLUSIONS: The pool of suboptimal pairwise protein sequence alignments substantially overlaps structure-based alignments for pairs with statistically significant similarity, and a regression model based on information contained in this alignment pool improves the accuracy of pairwise alignments with respect to structure-based alignments. BioMed Central 2010-03-22 /pmc/articles/PMC2850363/ /pubmed/20307279 http://dx.doi.org/10.1186/1471-2105-11-146 Text en Copyright ©2010 Sierk et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology article Sierk, Michael L Smoot, Michael E Bass, Ellen J Pearson, William R Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
title	Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
title_full	Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
title_fullStr	Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
title_full_unstemmed	Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
title_short	Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
title_sort	improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
topic	Methodology article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2850363/ https://www.ncbi.nlm.nih.gov/pubmed/20307279 http://dx.doi.org/10.1186/1471-2105-11-146
work_keys_str_mv	AT sierkmichaell improvingpairwisesequencealignmentaccuracyusingnearoptimalproteinsequencealignments AT smootmichaele improvingpairwisesequencealignmentaccuracyusingnearoptimalproteinsequencealignments AT bassellenj improvingpairwisesequencealignmentaccuracyusingnearoptimalproteinsequencealignments AT pearsonwilliamr improvingpairwisesequencealignmentaccuracyusingnearoptimalproteinsequencealignments

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

Ejemplares similares