Cargando…
Hypothesis Testing With Rank Conditions in Phylogenetics
A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flatte...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283673/ https://www.ncbi.nlm.nih.gov/pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357 |
_version_ | 1783723254225043456 |
---|---|
author | Long, Colby Kubatko, Laura |
author_facet | Long, Colby Kubatko, Laura |
author_sort | Long, Colby |
collection | PubMed |
description | A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods. |
format | Online Article Text |
id | pubmed-8283673 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82836732021-07-17 Hypothesis Testing With Rank Conditions in Phylogenetics Long, Colby Kubatko, Laura Front Genet Genetics A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4(n) possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods. Frontiers Media S.A. 2021-07-02 /pmc/articles/PMC8283673/ /pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357 Text en Copyright © 2021 Long and Kubatko. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Long, Colby Kubatko, Laura Hypothesis Testing With Rank Conditions in Phylogenetics |
title | Hypothesis Testing With Rank Conditions in Phylogenetics |
title_full | Hypothesis Testing With Rank Conditions in Phylogenetics |
title_fullStr | Hypothesis Testing With Rank Conditions in Phylogenetics |
title_full_unstemmed | Hypothesis Testing With Rank Conditions in Phylogenetics |
title_short | Hypothesis Testing With Rank Conditions in Phylogenetics |
title_sort | hypothesis testing with rank conditions in phylogenetics |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283673/ https://www.ncbi.nlm.nih.gov/pubmed/34276772 http://dx.doi.org/10.3389/fgene.2021.664357 |
work_keys_str_mv | AT longcolby hypothesistestingwithrankconditionsinphylogenetics AT kubatkolaura hypothesistestingwithrankconditionsinphylogenetics |