Cargando…

Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG

MOTIVATION: Phylogenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault toleran...

Descripción completa

Detalles Bibliográficos
Autores principales: Hübner, Lukas, Kozlov, Alexey M, Hespe, Demian, Sanders, Peter, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502163/
https://www.ncbi.nlm.nih.gov/pubmed/34037680
http://dx.doi.org/10.1093/bioinformatics/btab399
_version_ 1784795638396878848
author Hübner, Lukas
Kozlov, Alexey M
Hespe, Demian
Sanders, Peter
Stamatakis, Alexandros
author_facet Hübner, Lukas
Kozlov, Alexey M
Hespe, Demian
Sanders, Peter
Stamatakis, Alexandros
author_sort Hübner, Lukas
collection PubMed
description MOTIVATION: Phylogenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference. RESULTS: We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00 ± 0.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7 ± 0.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user. AVAILABILITY AND IMPLEMENTATION: The modified fault-tolerant RAxML-NG code is available under GNU GPL at https://github.com/lukashuebner/ft-raxml-ng. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9502163
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95021632022-09-26 Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG Hübner, Lukas Kozlov, Alexey M Hespe, Demian Sanders, Peter Stamatakis, Alexandros Bioinformatics Original Papers MOTIVATION: Phylogenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference. RESULTS: We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00 ± 0.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7 ± 0.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user. AVAILABILITY AND IMPLEMENTATION: The modified fault-tolerant RAxML-NG code is available under GNU GPL at https://github.com/lukashuebner/ft-raxml-ng. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-26 /pmc/articles/PMC9502163/ /pubmed/34037680 http://dx.doi.org/10.1093/bioinformatics/btab399 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Hübner, Lukas
Kozlov, Alexey M
Hespe, Demian
Sanders, Peter
Stamatakis, Alexandros
Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG
title Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG
title_full Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG
title_fullStr Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG
title_full_unstemmed Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG
title_short Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG
title_sort exploring parallel mpi fault tolerance mechanisms for phylogenetic inference with raxml-ng
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502163/
https://www.ncbi.nlm.nih.gov/pubmed/34037680
http://dx.doi.org/10.1093/bioinformatics/btab399
work_keys_str_mv AT hubnerlukas exploringparallelmpifaulttolerancemechanismsforphylogeneticinferencewithraxmlng
AT kozlovalexeym exploringparallelmpifaulttolerancemechanismsforphylogeneticinferencewithraxmlng
AT hespedemian exploringparallelmpifaulttolerancemechanismsforphylogeneticinferencewithraxmlng
AT sanderspeter exploringparallelmpifaulttolerancemechanismsforphylogeneticinferencewithraxmlng
AT stamatakisalexandros exploringparallelmpifaulttolerancemechanismsforphylogeneticinferencewithraxmlng