Cargando…

Kalign – an accurate and fast multiple sequence alignment algorithm

BACKGROUND: The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability...

Descripción completa

Detalles Bibliográficos
Autores principales: Lassmann, Timo, Sonnhammer, Erik LL
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325270/
https://www.ncbi.nlm.nih.gov/pubmed/16343337
http://dx.doi.org/10.1186/1471-2105-6-298
_version_ 1782126487185391616
author Lassmann, Timo
Sonnhammer, Erik LL
author_facet Lassmann, Timo
Sonnhammer, Erik LL
author_sort Lassmann, Timo
collection PubMed
description BACKGROUND: The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. RESULTS: We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. CONCLUSION: Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.
format Text
id pubmed-1325270
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13252702006-01-07 Kalign – an accurate and fast multiple sequence alignment algorithm Lassmann, Timo Sonnhammer, Erik LL BMC Bioinformatics Software BACKGROUND: The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. RESULTS: We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. CONCLUSION: Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences. BioMed Central 2005-12-12 /pmc/articles/PMC1325270/ /pubmed/16343337 http://dx.doi.org/10.1186/1471-2105-6-298 Text en Copyright © 2005 Lassmann and Sonnhammer; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Lassmann, Timo
Sonnhammer, Erik LL
Kalign – an accurate and fast multiple sequence alignment algorithm
title Kalign – an accurate and fast multiple sequence alignment algorithm
title_full Kalign – an accurate and fast multiple sequence alignment algorithm
title_fullStr Kalign – an accurate and fast multiple sequence alignment algorithm
title_full_unstemmed Kalign – an accurate and fast multiple sequence alignment algorithm
title_short Kalign – an accurate and fast multiple sequence alignment algorithm
title_sort kalign – an accurate and fast multiple sequence alignment algorithm
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325270/
https://www.ncbi.nlm.nih.gov/pubmed/16343337
http://dx.doi.org/10.1186/1471-2105-6-298
work_keys_str_mv AT lassmanntimo kalignanaccurateandfastmultiplesequencealignmentalgorithm
AT sonnhammererikll kalignanaccurateandfastmultiplesequencealignmentalgorithm