Cargando…
Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
BACKGROUND: Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is l...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015178/ https://www.ncbi.nlm.nih.gov/pubmed/24564837 http://dx.doi.org/10.1186/1471-2105-15-S2-S8 |
_version_ | 1782315295058165760 |
---|---|
author | Vera-Ruiz, Victor A Lau, Kwok W Robinson, John Jermiin, Lars S |
author_facet | Vera-Ruiz, Victor A Lau, Kwok W Robinson, John Jermiin, Lars S |
author_sort | Vera-Ruiz, Victor A |
collection | PubMed |
description | BACKGROUND: Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is lumpable, then the evolutionary process of the recoded groups is Markovian. RESULTS: We consider stationary, reversible, and homogeneous Markov processes on two taxa and compare three tests for lumpability: one using an ad hoc test statistic, which is based on an index that is evaluated using a bootstrap approximation of its distribution; one that is based on a test proposed specifically for Markov chains; and one using a likelihood-ratio test. We show that the likelihood-ratio test is more powerful than the index test, which is more powerful than that based on the Markov chain test statistic. We also show that for stationary processes on binary trees with more than two taxa, the tests can be applied to all pairs. Finally, we show that if the process is lumpable, then estimates obtained under the recoded model agree with estimates obtained under the original model, whereas, if the process is not lumpable, then these estimates can differ substantially. We apply the new likelihood-ratio test for lumpability to two primate data sets, one with a mitochondrial origin and one with a nuclear origin. CONCLUSIONS: Recoding may result in biased phylogenetic estimates because the original evolutionary process is not lumpable. Accordingly, testing for lumpability should be done prior to phylogenetic analysis of recoded data. |
format | Online Article Text |
id | pubmed-4015178 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40151782014-05-23 Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics Vera-Ruiz, Victor A Lau, Kwok W Robinson, John Jermiin, Lars S BMC Bioinformatics Proceedings BACKGROUND: Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is lumpable, then the evolutionary process of the recoded groups is Markovian. RESULTS: We consider stationary, reversible, and homogeneous Markov processes on two taxa and compare three tests for lumpability: one using an ad hoc test statistic, which is based on an index that is evaluated using a bootstrap approximation of its distribution; one that is based on a test proposed specifically for Markov chains; and one using a likelihood-ratio test. We show that the likelihood-ratio test is more powerful than the index test, which is more powerful than that based on the Markov chain test statistic. We also show that for stationary processes on binary trees with more than two taxa, the tests can be applied to all pairs. Finally, we show that if the process is lumpable, then estimates obtained under the recoded model agree with estimates obtained under the original model, whereas, if the process is not lumpable, then these estimates can differ substantially. We apply the new likelihood-ratio test for lumpability to two primate data sets, one with a mitochondrial origin and one with a nuclear origin. CONCLUSIONS: Recoding may result in biased phylogenetic estimates because the original evolutionary process is not lumpable. Accordingly, testing for lumpability should be done prior to phylogenetic analysis of recoded data. BioMed Central 2014-01-31 /pmc/articles/PMC4015178/ /pubmed/24564837 http://dx.doi.org/10.1186/1471-2105-15-S2-S8 Text en Copyright © 2014 Vera-Ruiz et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Vera-Ruiz, Victor A Lau, Kwok W Robinson, John Jermiin, Lars S Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
title | Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
title_full | Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
title_fullStr | Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
title_full_unstemmed | Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
title_short | Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
title_sort | statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015178/ https://www.ncbi.nlm.nih.gov/pubmed/24564837 http://dx.doi.org/10.1186/1471-2105-15-S2-S8 |
work_keys_str_mv | AT veraruizvictora statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics AT laukwokw statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics AT robinsonjohn statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics AT jermiinlarss statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics |