Cargando…

MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed...

Descripción completa

Detalles Bibliográficos
Autores principales: Díaz, David, Esteban, Francisco J., Hernández, Pilar, Caballero, Juan Antonio, Guevara, Antonio, Dorado, Gabriel, Gálvez, Sergio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977933/
https://www.ncbi.nlm.nih.gov/pubmed/24710354
http://dx.doi.org/10.1371/journal.pone.0094044
_version_ 1782310480388292608
author Díaz, David
Esteban, Francisco J.
Hernández, Pilar
Caballero, Juan Antonio
Guevara, Antonio
Dorado, Gabriel
Gálvez, Sergio
author_facet Díaz, David
Esteban, Francisco J.
Hernández, Pilar
Caballero, Juan Antonio
Guevara, Antonio
Dorado, Gabriel
Gálvez, Sergio
author_sort Díaz, David
collection PubMed
description We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications.
format Online
Article
Text
id pubmed-3977933
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39779332014-04-11 MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures Díaz, David Esteban, Francisco J. Hernández, Pilar Caballero, Juan Antonio Guevara, Antonio Dorado, Gabriel Gálvez, Sergio PLoS One Research Article We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications. Public Library of Science 2014-04-07 /pmc/articles/PMC3977933/ /pubmed/24710354 http://dx.doi.org/10.1371/journal.pone.0094044 Text en © 2014 Díaz et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Díaz, David
Esteban, Francisco J.
Hernández, Pilar
Caballero, Juan Antonio
Guevara, Antonio
Dorado, Gabriel
Gálvez, Sergio
MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
title MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
title_full MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
title_fullStr MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
title_full_unstemmed MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
title_short MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
title_sort mc64-clustalwp2: a highly-parallel hybrid strategy to align multiple sequences in many-core architectures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977933/
https://www.ncbi.nlm.nih.gov/pubmed/24710354
http://dx.doi.org/10.1371/journal.pone.0094044
work_keys_str_mv AT diazdavid mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures
AT estebanfranciscoj mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures
AT hernandezpilar mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures
AT caballerojuanantonio mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures
AT guevaraantonio mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures
AT doradogabriel mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures
AT galvezsergio mc64clustalwp2ahighlyparallelhybridstrategytoalignmultiplesequencesinmanycorearchitectures