Cargando…
Kalign 3: multiple sequence alignment of large datasets
MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703769/ https://www.ncbi.nlm.nih.gov/pubmed/31665271 http://dx.doi.org/10.1093/bioinformatics/btz795 |
_version_ | 1783616692159512576 |
---|---|
author | Lassmann, Timo |
author_facet | Lassmann, Timo |
author_sort | Lassmann, Timo |
collection | PubMed |
description | MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. RESULTS: Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools. AVAILABILITY AND IMPLEMENTATION: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign. |
format | Online Article Text |
id | pubmed-7703769 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77037692020-12-07 Kalign 3: multiple sequence alignment of large datasets Lassmann, Timo Bioinformatics Applications Note MOTIVATION: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. RESULTS: Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools. AVAILABILITY AND IMPLEMENTATION: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign. Oxford University Press 2020-03-15 2019-10-26 /pmc/articles/PMC7703769/ /pubmed/31665271 http://dx.doi.org/10.1093/bioinformatics/btz795 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Note Lassmann, Timo Kalign 3: multiple sequence alignment of large datasets |
title | Kalign 3: multiple sequence alignment of large datasets |
title_full | Kalign 3: multiple sequence alignment of large datasets |
title_fullStr | Kalign 3: multiple sequence alignment of large datasets |
title_full_unstemmed | Kalign 3: multiple sequence alignment of large datasets |
title_short | Kalign 3: multiple sequence alignment of large datasets |
title_sort | kalign 3: multiple sequence alignment of large datasets |
topic | Applications Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703769/ https://www.ncbi.nlm.nih.gov/pubmed/31665271 http://dx.doi.org/10.1093/bioinformatics/btz795 |
work_keys_str_mv | AT lassmanntimo kalign3multiplesequencealignmentoflargedatasets |