Cargando…

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Deorowicz, Sebastian, Debudaj-Grabysz, Agnieszka, Gudyś, Adam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037421/
https://www.ncbi.nlm.nih.gov/pubmed/27670777
http://dx.doi.org/10.1038/srep33964
_version_ 1782455734818045952
author Deorowicz, Sebastian
Debudaj-Grabysz, Agnieszka
Gudyś, Adam
author_facet Deorowicz, Sebastian
Debudaj-Grabysz, Agnieszka
Gudyś, Adam
author_sort Deorowicz, Sebastian
collection PubMed
description Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.
format Online
Article
Text
id pubmed-5037421
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-50374212016-09-30 FAMSA: Fast and accurate multiple sequence alignment of huge protein families Deorowicz, Sebastian Debudaj-Grabysz, Agnieszka Gudyś, Adam Sci Rep Article Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa. Nature Publishing Group 2016-09-27 /pmc/articles/PMC5037421/ /pubmed/27670777 http://dx.doi.org/10.1038/srep33964 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Deorowicz, Sebastian
Debudaj-Grabysz, Agnieszka
Gudyś, Adam
FAMSA: Fast and accurate multiple sequence alignment of huge protein families
title FAMSA: Fast and accurate multiple sequence alignment of huge protein families
title_full FAMSA: Fast and accurate multiple sequence alignment of huge protein families
title_fullStr FAMSA: Fast and accurate multiple sequence alignment of huge protein families
title_full_unstemmed FAMSA: Fast and accurate multiple sequence alignment of huge protein families
title_short FAMSA: Fast and accurate multiple sequence alignment of huge protein families
title_sort famsa: fast and accurate multiple sequence alignment of huge protein families
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037421/
https://www.ncbi.nlm.nih.gov/pubmed/27670777
http://dx.doi.org/10.1038/srep33964
work_keys_str_mv AT deorowiczsebastian famsafastandaccuratemultiplesequencealignmentofhugeproteinfamilies
AT debudajgrabyszagnieszka famsafastandaccuratemultiplesequencealignmentofhugeproteinfamilies
AT gudysadam famsafastandaccuratemultiplesequencealignmentofhugeproteinfamilies