Cargando…

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

BACKGROUND: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular appro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Frandsen, Paul B, Calcott, Brett, Mayer, Christoph, Lanfear, Robert
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327964/ https://www.ncbi.nlm.nih.gov/pubmed/25887041 http://dx.doi.org/10.1186/s12862-015-0283-7

_version_	1782357166409121792
author	Frandsen, Paul B Calcott, Brett Mayer, Christoph Lanfear, Robert
author_facet	Frandsen, Paul B Calcott, Brett Mayer, Christoph Lanfear, Robert
author_sort	Frandsen, Paul B
collection	PubMed
description	BACKGROUND: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses. RESULTS: We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias. CONCLUSIONS: Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.
format	Online Article Text
id	pubmed-4327964
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43279642015-02-15 Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates Frandsen, Paul B Calcott, Brett Mayer, Christoph Lanfear, Robert BMC Evol Biol Methodology Article BACKGROUND: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses. RESULTS: We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias. CONCLUSIONS: Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning. BioMed Central 2015-02-10 /pmc/articles/PMC4327964/ /pubmed/25887041 http://dx.doi.org/10.1186/s12862-015-0283-7 Text en © Frandsen et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Frandsen, Paul B Calcott, Brett Mayer, Christoph Lanfear, Robert Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
title	Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
title_full	Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
title_fullStr	Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
title_full_unstemmed	Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
title_short	Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
title_sort	automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327964/ https://www.ncbi.nlm.nih.gov/pubmed/25887041 http://dx.doi.org/10.1186/s12862-015-0283-7
work_keys_str_mv	AT frandsenpaulb automaticselectionofpartitioningschemesforphylogeneticanalysesusingiterativekmeansclusteringofsiterates AT calcottbrett automaticselectionofpartitioningschemesforphylogeneticanalysesusingiterativekmeansclusteringofsiterates AT mayerchristoph automaticselectionofpartitioningschemesforphylogeneticanalysesusingiterativekmeansclusteringofsiterates AT lanfearrobert automaticselectionofpartitioningschemesforphylogeneticanalysesusingiterativekmeansclusteringofsiterates

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

Ejemplares similares