Cargando…

Hypothesis Testing With Rank Conditions in Phylogenetics

A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flatte...

Descripción completa

Detalles Bibliográficos
Autores principales: Long, Colby, Kubatko, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283673/
https://www.ncbi.nlm.nih.gov/pubmed/34276772
http://dx.doi.org/10.3389/fgene.2021.664357
_version_ 1783723254225043456
author Long, Colby
Kubatko, Laura
author_facet Long, Colby
Kubatko, Laura
author_sort Long, Colby
collection PubMed
description A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods.
format Online
Article
Text
id pubmed-8283673
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82836732021-07-17 Hypothesis Testing With Rank Conditions in Phylogenetics Long, Colby Kubatko, Laura Front Genet Genetics A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods. Frontiers Media S.A. 2021-07-02 /pmc/articles/PMC8283673/ /pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357 Text en Copyright © 2021 Long and Kubatko. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Long, Colby
Kubatko, Laura
Hypothesis Testing With Rank Conditions in Phylogenetics
title Hypothesis Testing With Rank Conditions in Phylogenetics
title_full Hypothesis Testing With Rank Conditions in Phylogenetics
title_fullStr Hypothesis Testing With Rank Conditions in Phylogenetics
title_full_unstemmed Hypothesis Testing With Rank Conditions in Phylogenetics
title_short Hypothesis Testing With Rank Conditions in Phylogenetics
title_sort hypothesis testing with rank conditions in phylogenetics
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283673/
https://www.ncbi.nlm.nih.gov/pubmed/34276772
http://dx.doi.org/10.3389/fgene.2021.664357
work_keys_str_mv AT longcolby hypothesistestingwithrankconditionsinphylogenetics
AT kubatkolaura hypothesistestingwithrankconditionsinphylogenetics