Cargando…

Protein sequence alignment with family-specific amino acid similarity matrices

BACKGROUND: Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one nee...

Descripción completa

Detalles Bibliográficos
Autor principal: Kuznetsov, Igor B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201029/
https://www.ncbi.nlm.nih.gov/pubmed/21846354
http://dx.doi.org/10.1186/1756-0500-4-296
_version_ 1782214803825098752
author Kuznetsov, Igor B
author_facet Kuznetsov, Igor B
author_sort Kuznetsov, Igor B
collection PubMed
description BACKGROUND: Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment. FINDINGS: I utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned. CONCLUSIONS: The results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm.
format Online
Article
Text
id pubmed-3201029
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32010292011-10-26 Protein sequence alignment with family-specific amino acid similarity matrices Kuznetsov, Igor B BMC Res Notes Technical Note BACKGROUND: Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment. FINDINGS: I utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned. CONCLUSIONS: The results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm. BioMed Central 2011-08-16 /pmc/articles/PMC3201029/ /pubmed/21846354 http://dx.doi.org/10.1186/1756-0500-4-296 Text en Copyright ©2011 Kuznetsov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Kuznetsov, Igor B
Protein sequence alignment with family-specific amino acid similarity matrices
title Protein sequence alignment with family-specific amino acid similarity matrices
title_full Protein sequence alignment with family-specific amino acid similarity matrices
title_fullStr Protein sequence alignment with family-specific amino acid similarity matrices
title_full_unstemmed Protein sequence alignment with family-specific amino acid similarity matrices
title_short Protein sequence alignment with family-specific amino acid similarity matrices
title_sort protein sequence alignment with family-specific amino acid similarity matrices
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201029/
https://www.ncbi.nlm.nih.gov/pubmed/21846354
http://dx.doi.org/10.1186/1756-0500-4-296
work_keys_str_mv AT kuznetsovigorb proteinsequencealignmentwithfamilyspecificaminoacidsimilaritymatrices