Cargando…

Hypothesis Testing With Rank Conditions in Phylogenetics

A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flatte...

Descripción completa

Detalles Bibliográficos
Autores principales:	Long, Colby, Kubatko, Laura
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283673/ https://www.ncbi.nlm.nih.gov/pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357

_version_	1783723254225043456
author	Long, Colby Kubatko, Laura
author_facet	Long, Colby Kubatko, Laura
author_sort	Long, Colby
collection	PubMed
description	A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods.
format	Online Article Text
id	pubmed-8283673
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-82836732021-07-17 Hypothesis Testing With Rank Conditions in Phylogenetics Long, Colby Kubatko, Laura Front Genet Genetics A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods. Frontiers Media S.A. 2021-07-02 /pmc/articles/PMC8283673/ /pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357 Text en Copyright © 2021 Long and Kubatko. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Long, Colby Kubatko, Laura Hypothesis Testing With Rank Conditions in Phylogenetics
title	Hypothesis Testing With Rank Conditions in Phylogenetics
title_full	Hypothesis Testing With Rank Conditions in Phylogenetics
title_fullStr	Hypothesis Testing With Rank Conditions in Phylogenetics
title_full_unstemmed	Hypothesis Testing With Rank Conditions in Phylogenetics
title_short	Hypothesis Testing With Rank Conditions in Phylogenetics
title_sort	hypothesis testing with rank conditions in phylogenetics
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283673/ https://www.ncbi.nlm.nih.gov/pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357
work_keys_str_mv	AT longcolby hypothesistestingwithrankconditionsinphylogenetics AT kubatkolaura hypothesistestingwithrankconditionsinphylogenetics

Hypothesis Testing With Rank Conditions in Phylogenetics

Ejemplares similares