Cargando…

Assessing the efficiency of multiple sequence alignment programs

BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational tim...

Descripción completa

Detalles Bibliográficos
Autores principales: Pais, Fabiano Sviatopolk-Mirsky, Ruy, Patrícia de Cássia, Oliveira, Guilherme, Coimbra, Roney Santos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/
https://www.ncbi.nlm.nih.gov/pubmed/24602402
http://dx.doi.org/10.1186/1748-7188-9-4
_version_ 1782315376161325056
author Pais, Fabiano Sviatopolk-Mirsky
Ruy, Patrícia de Cássia
Oliveira, Guilherme
Coimbra, Roney Santos
author_facet Pais, Fabiano Sviatopolk-Mirsky
Ruy, Patrícia de Cássia
Oliveira, Guilherme
Coimbra, Roney Santos
author_sort Pais, Fabiano Sviatopolk-Mirsky
collection PubMed
description BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program’s algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly.
format Online
Article
Text
id pubmed-4015676
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40156762014-05-10 Assessing the efficiency of multiple sequence alignment programs Pais, Fabiano Sviatopolk-Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Coimbra, Roney Santos Algorithms Mol Biol Research BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program’s algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly. BioMed Central 2014-03-06 /pmc/articles/PMC4015676/ /pubmed/24602402 http://dx.doi.org/10.1186/1748-7188-9-4 Text en Copyright © 2014 Pais et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research
Pais, Fabiano Sviatopolk-Mirsky
Ruy, Patrícia de Cássia
Oliveira, Guilherme
Coimbra, Roney Santos
Assessing the efficiency of multiple sequence alignment programs
title Assessing the efficiency of multiple sequence alignment programs
title_full Assessing the efficiency of multiple sequence alignment programs
title_fullStr Assessing the efficiency of multiple sequence alignment programs
title_full_unstemmed Assessing the efficiency of multiple sequence alignment programs
title_short Assessing the efficiency of multiple sequence alignment programs
title_sort assessing the efficiency of multiple sequence alignment programs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/
https://www.ncbi.nlm.nih.gov/pubmed/24602402
http://dx.doi.org/10.1186/1748-7188-9-4
work_keys_str_mv AT paisfabianosviatopolkmirsky assessingtheefficiencyofmultiplesequencealignmentprograms
AT ruypatriciadecassia assessingtheefficiencyofmultiplesequencealignmentprograms
AT oliveiraguilherme assessingtheefficiencyofmultiplesequencealignmentprograms
AT coimbraroneysantos assessingtheefficiencyofmultiplesequencealignmentprograms