Cargando…

Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity

Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schrempf, Dominik, Lartillot, Nicolas, Szöllősi, Gergely
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7743758/ https://www.ncbi.nlm.nih.gov/pubmed/32877529 http://dx.doi.org/10.1093/molbev/msaa145

_version_	1783624293109727232
author	Schrempf, Dominik Lartillot, Nicolas Szöllősi, Gergely
author_facet	Schrempf, Dominik Lartillot, Nicolas Szöllősi, Gergely
author_sort	Schrempf, Dominik
collection	PubMed
description	Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10–C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10–C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).
format	Online Article Text
id	pubmed-7743758
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-77437582020-12-21 Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity Schrempf, Dominik Lartillot, Nicolas Szöllősi, Gergely Mol Biol Evol Methods Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10–C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10–C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes). Oxford University Press 2020-09-08 /pmc/articles/PMC7743758/ /pubmed/32877529 http://dx.doi.org/10.1093/molbev/msaa145 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Methods Schrempf, Dominik Lartillot, Nicolas Szöllősi, Gergely Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
title	Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
title_full	Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
title_fullStr	Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
title_full_unstemmed	Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
title_short	Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
title_sort	scalable empirical mixture models that account for across-site compositional heterogeneity
topic	Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7743758/ https://www.ncbi.nlm.nih.gov/pubmed/32877529 http://dx.doi.org/10.1093/molbev/msaa145
work_keys_str_mv	AT schrempfdominik scalableempiricalmixturemodelsthataccountforacrosssitecompositionalheterogeneity AT lartillotnicolas scalableempiricalmixturemodelsthataccountforacrosssitecompositionalheterogeneity AT szollosigergely scalableempiricalmixturemodelsthataccountforacrosssitecompositionalheterogeneity

Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity

Ejemplares similares