Cargando…

Kalign 3: multiple sequence alignment of large datasets

MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely...

Descripción completa

Detalles Bibliográficos
Autor principal: Lassmann, Timo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703769/
https://www.ncbi.nlm.nih.gov/pubmed/31665271
http://dx.doi.org/10.1093/bioinformatics/btz795
Descripción
Sumario:MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. RESULTS: Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools. AVAILABILITY AND IMPLEMENTATION: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign.