Cargando…

Assessing the efficiency of multiple sequence alignment programs

BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational tim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pais, Fabiano Sviatopolk-Mirsky, Ruy, Patrícia de Cássia, Oliveira, Guilherme, Coimbra, Roney Santos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/ https://www.ncbi.nlm.nih.gov/pubmed/24602402 http://dx.doi.org/10.1186/1748-7188-9-4

_version_	1782315376161325056
author	Pais, Fabiano Sviatopolk-Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Coimbra, Roney Santos
author_facet	Pais, Fabiano Sviatopolk-Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Coimbra, Roney Santos
author_sort	Pais, Fabiano Sviatopolk-Mirsky
collection	PubMed
description	BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program’s algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly.
format	Online Article Text
id	pubmed-4015676
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40156762014-05-10 Assessing the efficiency of multiple sequence alignment programs Pais, Fabiano Sviatopolk-Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Coimbra, Roney Santos Algorithms Mol Biol Research BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program’s algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly. BioMed Central 2014-03-06 /pmc/articles/PMC4015676/ /pubmed/24602402 http://dx.doi.org/10.1186/1748-7188-9-4 Text en Copyright © 2014 Pais et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Research Pais, Fabiano Sviatopolk-Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Coimbra, Roney Santos Assessing the efficiency of multiple sequence alignment programs
title	Assessing the efficiency of multiple sequence alignment programs
title_full	Assessing the efficiency of multiple sequence alignment programs
title_fullStr	Assessing the efficiency of multiple sequence alignment programs
title_full_unstemmed	Assessing the efficiency of multiple sequence alignment programs
title_short	Assessing the efficiency of multiple sequence alignment programs
title_sort	assessing the efficiency of multiple sequence alignment programs
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/ https://www.ncbi.nlm.nih.gov/pubmed/24602402 http://dx.doi.org/10.1186/1748-7188-9-4
work_keys_str_mv	AT paisfabianosviatopolkmirsky assessingtheefficiencyofmultiplesequencealignmentprograms AT ruypatriciadecassia assessingtheefficiencyofmultiplesequencealignmentprograms AT oliveiraguilherme assessingtheefficiencyofmultiplesequencealignmentprograms AT coimbraroneysantos assessingtheefficiencyofmultiplesequencealignmentprograms

Assessing the efficiency of multiple sequence alignment programs

Ejemplares similares