Cargando…

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

BACKGROUND: Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to gener...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tai, Chin-Hsien, Vincent, James J, Kim, Changhoon, Lee, Byungkook
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648757/ https://www.ncbi.nlm.nih.gov/pubmed/19208141 http://dx.doi.org/10.1186/1471-2105-10-S1-S4

_version_	1782164981084585984
author	Tai, Chin-Hsien Vincent, James J Kim, Changhoon Lee, Byungkook
author_facet	Tai, Chin-Hsien Vincent, James J Kim, Changhoon Lee, Byungkook
author_sort	Tai, Chin-Hsien
collection	PubMed
description	BACKGROUND: Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments. RESULTS: SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP. CONCLUSION: The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.
format	Text
id	pubmed-2648757
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26487572009-03-03 SE: an algorithm for deriving sequence alignment from a pair of superimposed structures Tai, Chin-Hsien Vincent, James J Kim, Changhoon Lee, Byungkook BMC Bioinformatics Research BACKGROUND: Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments. RESULTS: SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP. CONCLUSION: The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm. BioMed Central 2009-01-30 /pmc/articles/PMC2648757/ /pubmed/19208141 http://dx.doi.org/10.1186/1471-2105-10-S1-S4 Text en Copyright © 2009 Tai et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Tai, Chin-Hsien Vincent, James J Kim, Changhoon Lee, Byungkook SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
title	SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
title_full	SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
title_fullStr	SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
title_full_unstemmed	SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
title_short	SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
title_sort	se: an algorithm for deriving sequence alignment from a pair of superimposed structures
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648757/ https://www.ncbi.nlm.nih.gov/pubmed/19208141 http://dx.doi.org/10.1186/1471-2105-10-S1-S4
work_keys_str_mv	AT taichinhsien seanalgorithmforderivingsequencealignmentfromapairofsuperimposedstructures AT vincentjamesj seanalgorithmforderivingsequencealignmentfromapairofsuperimposedstructures AT kimchanghoon seanalgorithmforderivingsequencealignmentfromapairofsuperimposedstructures AT leebyungkook seanalgorithmforderivingsequencealignmentfromapairofsuperimposedstructures

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

Ejemplares similares