Cargando…

An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy

Felsenstein's pruning algorithm allows one to calculate the probability of any particular data pattern arising on a phylogeny given a model of character evolution. Here we present a similar dynamic programming algorithm. Our algorithm treats the tree and model as known. The algorithm makes it f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Koch, Jordan M., Holder, Mark T.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2012
Materias:	Algorithms/Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3712476/ https://www.ncbi.nlm.nih.gov/pubmed/23868168 http://dx.doi.org/10.1371/4fd1286980c08

_version_	1782277075778928640
author	Koch, Jordan M. Holder, Mark T.
author_facet	Koch, Jordan M. Holder, Mark T.
author_sort	Koch, Jordan M.
collection	PubMed
description	Felsenstein's pruning algorithm allows one to calculate the probability of any particular data pattern arising on a phylogeny given a model of character evolution. Here we present a similar dynamic programming algorithm. Our algorithm treats the tree and model as known. The algorithm makes it feasible to calculate the probability that a randomly selected character will be a member of a particular class of character patterns. Specifically, we are interested in binning patterns by the number of parsimony steps and the set of states observed at the tips of the tree. This algorithm was developed to expand the range of data set sizes that can be used with Waddell et al.'s marginal testing approach for assessing the adequacy of a model. The algorithms introduced can also be used in likelihood calculations which correct for ascertainment biases. For example, Lewis introduced an Mkv model which corrects for the lack of constant sites. The probability of a constant pattern arising can be calculated using the algorithm that we present, or by enumerating all possible constant patterns and calculating the probability of each one. Because the number of constant data patterns is small, both methods are efficient. However, elaborations of the Mkv model (such as those in Nylander et al) require calculating the probability of parsimony-uninformative patterns arising. For large trees and characters with many possible character states, the number of possible parismony-uninformative patterns is immense. In these cases, the algorithms introduced here will be more efficient. The algorithm has been implemented in open source software written in C++.
format	Online Article Text
id	pubmed-3712476
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-37124762013-07-17 An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy Koch, Jordan M. Holder, Mark T. PLoS Curr Algorithms/Software Felsenstein's pruning algorithm allows one to calculate the probability of any particular data pattern arising on a phylogeny given a model of character evolution. Here we present a similar dynamic programming algorithm. Our algorithm treats the tree and model as known. The algorithm makes it feasible to calculate the probability that a randomly selected character will be a member of a particular class of character patterns. Specifically, we are interested in binning patterns by the number of parsimony steps and the set of states observed at the tips of the tree. This algorithm was developed to expand the range of data set sizes that can be used with Waddell et al.'s marginal testing approach for assessing the adequacy of a model. The algorithms introduced can also be used in likelihood calculations which correct for ascertainment biases. For example, Lewis introduced an Mkv model which corrects for the lack of constant sites. The probability of a constant pattern arising can be calculated using the algorithm that we present, or by enumerating all possible constant patterns and calculating the probability of each one. Because the number of constant data patterns is small, both methods are efficient. However, elaborations of the Mkv model (such as those in Nylander et al) require calculating the probability of parsimony-uninformative patterns arising. For large trees and characters with many possible character states, the number of possible parismony-uninformative patterns is immense. In these cases, the algorithms introduced here will be more efficient. The algorithm has been implemented in open source software written in C++. Public Library of Science 2012-12-14 /pmc/articles/PMC3712476/ /pubmed/23868168 http://dx.doi.org/10.1371/4fd1286980c08 Text en http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Algorithms/Software Koch, Jordan M. Holder, Mark T. An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
title	An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
title_full	An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
title_fullStr	An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
title_full_unstemmed	An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
title_short	An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
title_sort	algorithm for calculating the probability of classes of data patterns on a genealogy
topic	Algorithms/Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3712476/ https://www.ncbi.nlm.nih.gov/pubmed/23868168 http://dx.doi.org/10.1371/4fd1286980c08
work_keys_str_mv	AT kochjordanm analgorithmforcalculatingtheprobabilityofclassesofdatapatternsonagenealogy AT holdermarkt analgorithmforcalculatingtheprobabilityofclassesofdatapatternsonagenealogy AT kochjordanm algorithmforcalculatingtheprobabilityofclassesofdatapatternsonagenealogy AT holdermarkt algorithmforcalculatingtheprobabilityofclassesofdatapatternsonagenealogy

An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy

Ejemplares similares