Cargando…

Selecting optimal partitioning schemes for phylogenomic datasets

BACKGROUND: Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasibl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lanfear, Robert, Calcott, Brett, Kainer, David, Mayer, Christoph, Stamatakis, Alexandros
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4012149/ https://www.ncbi.nlm.nih.gov/pubmed/24742000 http://dx.doi.org/10.1186/1471-2148-14-82

_version_	1782314891615404032
author	Lanfear, Robert Calcott, Brett Kainer, David Mayer, Christoph Stamatakis, Alexandros
author_facet	Lanfear, Robert Calcott, Brett Kainer, David Mayer, Christoph Stamatakis, Alexandros
author_sort	Lanfear, Robert
collection	PubMed
description	BACKGROUND: Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. METHODS: We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. RESULTS: We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. CONCLUSIONS: These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.
format	Online Article Text
id	pubmed-4012149
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40121492014-05-08 Selecting optimal partitioning schemes for phylogenomic datasets Lanfear, Robert Calcott, Brett Kainer, David Mayer, Christoph Stamatakis, Alexandros BMC Evol Biol Methodology Article BACKGROUND: Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. METHODS: We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. RESULTS: We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. CONCLUSIONS: These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets. BioMed Central 2014-04-17 /pmc/articles/PMC4012149/ /pubmed/24742000 http://dx.doi.org/10.1186/1471-2148-14-82 Text en Copyright © 2014 Lanfear et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Lanfear, Robert Calcott, Brett Kainer, David Mayer, Christoph Stamatakis, Alexandros Selecting optimal partitioning schemes for phylogenomic datasets
title	Selecting optimal partitioning schemes for phylogenomic datasets
title_full	Selecting optimal partitioning schemes for phylogenomic datasets
title_fullStr	Selecting optimal partitioning schemes for phylogenomic datasets
title_full_unstemmed	Selecting optimal partitioning schemes for phylogenomic datasets
title_short	Selecting optimal partitioning schemes for phylogenomic datasets
title_sort	selecting optimal partitioning schemes for phylogenomic datasets
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4012149/ https://www.ncbi.nlm.nih.gov/pubmed/24742000 http://dx.doi.org/10.1186/1471-2148-14-82
work_keys_str_mv	AT lanfearrobert selectingoptimalpartitioningschemesforphylogenomicdatasets AT calcottbrett selectingoptimalpartitioningschemesforphylogenomicdatasets AT kainerdavid selectingoptimalpartitioningschemesforphylogenomicdatasets AT mayerchristoph selectingoptimalpartitioningschemesforphylogenomicdatasets AT stamatakisalexandros selectingoptimalpartitioningschemesforphylogenomicdatasets

Selecting optimal partitioning schemes for phylogenomic datasets

Ejemplares similares