Cargando…

Kalign 3: multiple sequence alignment of large datasets

MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely...

Descripción completa

Detalles Bibliográficos
Autor principal: Lassmann, Timo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703769/
https://www.ncbi.nlm.nih.gov/pubmed/31665271
http://dx.doi.org/10.1093/bioinformatics/btz795
_version_ 1783616692159512576
author Lassmann, Timo
author_facet Lassmann, Timo
author_sort Lassmann, Timo
collection PubMed
description MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. RESULTS: Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools. AVAILABILITY AND IMPLEMENTATION: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign.
format Online
Article
Text
id pubmed-7703769
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77037692020-12-07 Kalign 3: multiple sequence alignment of large datasets Lassmann, Timo Bioinformatics Applications Note MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. RESULTS: Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools. AVAILABILITY AND IMPLEMENTATION: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign. Oxford University Press 2020-03-15 2019-10-26 /pmc/articles/PMC7703769/ /pubmed/31665271 http://dx.doi.org/10.1093/bioinformatics/btz795 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Note
Lassmann, Timo
Kalign 3: multiple sequence alignment of large datasets
title Kalign 3: multiple sequence alignment of large datasets
title_full Kalign 3: multiple sequence alignment of large datasets
title_fullStr Kalign 3: multiple sequence alignment of large datasets
title_full_unstemmed Kalign 3: multiple sequence alignment of large datasets
title_short Kalign 3: multiple sequence alignment of large datasets
title_sort kalign 3: multiple sequence alignment of large datasets
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703769/
https://www.ncbi.nlm.nih.gov/pubmed/31665271
http://dx.doi.org/10.1093/bioinformatics/btz795
work_keys_str_mv AT lassmanntimo kalign3multiplesequencealignmentoflargedatasets