Cargando…

Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages

Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common to all lineages. Many phylogenetic methods relax this hypothesis but keep a simple enough model to make the process of sequence evolution more tractable. On the other hand...

Descripción completa

Detalles Bibliográficos
Autores principales: Casanellas, Marta, Fernández-Sánchez, Jesús, Garrote-López, Marina, Sabaté-Vidales, Marc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10264505/
https://www.ncbi.nlm.nih.gov/pubmed/37310552
http://dx.doi.org/10.1007/s11538-023-01167-y
_version_ 1785058338180956160
author Casanellas, Marta
Fernández-Sánchez, Jesús
Garrote-López, Marina
Sabaté-Vidales, Marc
author_facet Casanellas, Marta
Fernández-Sánchez, Jesús
Garrote-López, Marina
Sabaté-Vidales, Marc
author_sort Casanellas, Marta
collection PubMed
description Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common to all lineages. Many phylogenetic methods relax this hypothesis but keep a simple enough model to make the process of sequence evolution more tractable. On the other hand, dealing successfully with the general case (heterogeneity of rates across lineages) is one of the key features of phylogenetic reconstruction methods based on algebraic tools. The goal of this paper is twofold. First, we present a new weighting system for quartets (ASAQ) based on algebraic and semi-algebraic tools, thus especially indicated to deal with data evolving under heterogeneous rates. This method combines the weights of two previous methods by means of a test based on the positivity of the branch lengths estimated with the paralinear distance. ASAQ is statistically consistent when applied to data generated under the general Markov model, considers rate and base composition heterogeneity among lineages and does not assume stationarity nor time-reversibility. Second, we test and compare the performance of several quartet-based methods for phylogenetic tree reconstruction (namely QFM, wQFM, quartet puzzling, weight optimization and Willson’s method) in combination with several systems of weights, including ASAQ weights and other weights based on algebraic and semi-algebraic methods or on the paralinear distance. These tests are applied to both simulated and real data and support weight optimization with ASAQ weights as a reliable and successful reconstruction method that improves upon the accuracy of global methods (such as neighbor-joining or maximum likelihood) in the presence of long branches or on mixtures of distributions on trees.
format Online
Article
Text
id pubmed-10264505
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-102645052023-06-15 Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages Casanellas, Marta Fernández-Sánchez, Jesús Garrote-López, Marina Sabaté-Vidales, Marc Bull Math Biol Methods Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common to all lineages. Many phylogenetic methods relax this hypothesis but keep a simple enough model to make the process of sequence evolution more tractable. On the other hand, dealing successfully with the general case (heterogeneity of rates across lineages) is one of the key features of phylogenetic reconstruction methods based on algebraic tools. The goal of this paper is twofold. First, we present a new weighting system for quartets (ASAQ) based on algebraic and semi-algebraic tools, thus especially indicated to deal with data evolving under heterogeneous rates. This method combines the weights of two previous methods by means of a test based on the positivity of the branch lengths estimated with the paralinear distance. ASAQ is statistically consistent when applied to data generated under the general Markov model, considers rate and base composition heterogeneity among lineages and does not assume stationarity nor time-reversibility. Second, we test and compare the performance of several quartet-based methods for phylogenetic tree reconstruction (namely QFM, wQFM, quartet puzzling, weight optimization and Willson’s method) in combination with several systems of weights, including ASAQ weights and other weights based on algebraic and semi-algebraic methods or on the paralinear distance. These tests are applied to both simulated and real data and support weight optimization with ASAQ weights as a reliable and successful reconstruction method that improves upon the accuracy of global methods (such as neighbor-joining or maximum likelihood) in the presence of long branches or on mixtures of distributions on trees. Springer US 2023-06-13 2023 /pmc/articles/PMC10264505/ /pubmed/37310552 http://dx.doi.org/10.1007/s11538-023-01167-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Casanellas, Marta
Fernández-Sánchez, Jesús
Garrote-López, Marina
Sabaté-Vidales, Marc
Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages
title Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages
title_full Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages
title_fullStr Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages
title_full_unstemmed Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages
title_short Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages
title_sort designing weights for quartet-based methods when data are heterogeneous across lineages
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10264505/
https://www.ncbi.nlm.nih.gov/pubmed/37310552
http://dx.doi.org/10.1007/s11538-023-01167-y
work_keys_str_mv AT casanellasmarta designingweightsforquartetbasedmethodswhendataareheterogeneousacrosslineages
AT fernandezsanchezjesus designingweightsforquartetbasedmethodswhendataareheterogeneousacrosslineages
AT garrotelopezmarina designingweightsforquartetbasedmethodswhendataareheterogeneousacrosslineages
AT sabatevidalesmarc designingweightsforquartetbasedmethodswhendataareheterogeneousacrosslineages