Cargando…

Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data

Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A property that characterizes phylogenomic datasets is that they tend to be gappy, i.e. can contain taxa with (many and disparate) missin...

Descripción completa

Detalles Bibliográficos
Autores principales: Stamatakis, Alexandros, Alachiotis, Nikolaos
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881390/
https://www.ncbi.nlm.nih.gov/pubmed/20529898
http://dx.doi.org/10.1093/bioinformatics/btq205
_version_ 1782182113251950592
author Stamatakis, Alexandros
Alachiotis, Nikolaos
author_facet Stamatakis, Alexandros
Alachiotis, Nikolaos
author_sort Stamatakis, Alexandros
collection PubMed
description Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A property that characterizes phylogenomic datasets is that they tend to be gappy, i.e. can contain taxa with (many and disparate) missing genes. In current phylogenomic analyses, this type of alignment gappyness that is induced by missing data frequently exceeds 90%. We present and implement a generally applicable mechanism that allows for reducing memory footprints of likelihood-based [maximum likelihood (ML) or Bayesian] phylogenomic analyses proportional to the amount of missing data in the alignment. We also introduce a set of algorithmic rules to efficiently conduct tree searches via subtree pruning and re-grafting moves using this mechanism. Results: On a large phylogenomic DNA dataset with 2177 taxa, 68 genes and a gappyness of 90%, we achieve a memory footprint reduction from 9 GB down to 1 GB, a speedup for optimizing ML model parameters of 11, and accelerate the Subtree Pruning Regrafting tree search phase by factor 16. Thus, our approach can be deployed to improve efficiency for the two most important resources, CPU time and memory, by up to one order of magnitude. Availability: Current open-source version of RAxML v7.2.6 available at http://wwwkramer.in.tum.de/exelixis/software.html. Contact: stamatak@cs.tum.edu
format Text
id pubmed-2881390
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28813902010-06-08 Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data Stamatakis, Alexandros Alachiotis, Nikolaos Bioinformatics Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A property that characterizes phylogenomic datasets is that they tend to be gappy, i.e. can contain taxa with (many and disparate) missing genes. In current phylogenomic analyses, this type of alignment gappyness that is induced by missing data frequently exceeds 90%. We present and implement a generally applicable mechanism that allows for reducing memory footprints of likelihood-based [maximum likelihood (ML) or Bayesian] phylogenomic analyses proportional to the amount of missing data in the alignment. We also introduce a set of algorithmic rules to efficiently conduct tree searches via subtree pruning and re-grafting moves using this mechanism. Results: On a large phylogenomic DNA dataset with 2177 taxa, 68 genes and a gappyness of 90%, we achieve a memory footprint reduction from 9 GB down to 1 GB, a speedup for optimizing ML model parameters of 11, and accelerate the Subtree Pruning Regrafting tree search phase by factor 16. Thus, our approach can be deployed to improve efficiency for the two most important resources, CPU time and memory, by up to one order of magnitude. Availability: Current open-source version of RAxML v7.2.6 available at http://wwwkramer.in.tum.de/exelixis/software.html. Contact: stamatak@cs.tum.edu Oxford University Press 2010-06-15 2010-06-01 /pmc/articles/PMC2881390/ /pubmed/20529898 http://dx.doi.org/10.1093/bioinformatics/btq205 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa
Stamatakis, Alexandros
Alachiotis, Nikolaos
Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
title Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
title_full Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
title_fullStr Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
title_full_unstemmed Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
title_short Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
title_sort time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data
topic Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881390/
https://www.ncbi.nlm.nih.gov/pubmed/20529898
http://dx.doi.org/10.1093/bioinformatics/btq205
work_keys_str_mv AT stamatakisalexandros timeandmemoryefficientlikelihoodbasedtreesearchesonphylogenomicalignmentswithmissingdata
AT alachiotisnikolaos timeandmemoryefficientlikelihoodbasedtreesearchesonphylogenomicalignmentswithmissingdata