Cargando…

Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics

BACKGROUND: Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is l...

Descripción completa

Detalles Bibliográficos
Autores principales: Vera-Ruiz, Victor A, Lau, Kwok W, Robinson, John, Jermiin, Lars S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015178/
https://www.ncbi.nlm.nih.gov/pubmed/24564837
http://dx.doi.org/10.1186/1471-2105-15-S2-S8
_version_ 1782315295058165760
author Vera-Ruiz, Victor A
Lau, Kwok W
Robinson, John
Jermiin, Lars S
author_facet Vera-Ruiz, Victor A
Lau, Kwok W
Robinson, John
Jermiin, Lars S
author_sort Vera-Ruiz, Victor A
collection PubMed
description BACKGROUND: Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is lumpable, then the evolutionary process of the recoded groups is Markovian. RESULTS: We consider stationary, reversible, and homogeneous Markov processes on two taxa and compare three tests for lumpability: one using an ad hoc test statistic, which is based on an index that is evaluated using a bootstrap approximation of its distribution; one that is based on a test proposed specifically for Markov chains; and one using a likelihood-ratio test. We show that the likelihood-ratio test is more powerful than the index test, which is more powerful than that based on the Markov chain test statistic. We also show that for stationary processes on binary trees with more than two taxa, the tests can be applied to all pairs. Finally, we show that if the process is lumpable, then estimates obtained under the recoded model agree with estimates obtained under the original model, whereas, if the process is not lumpable, then these estimates can differ substantially. We apply the new likelihood-ratio test for lumpability to two primate data sets, one with a mitochondrial origin and one with a nuclear origin. CONCLUSIONS: Recoding may result in biased phylogenetic estimates because the original evolutionary process is not lumpable. Accordingly, testing for lumpability should be done prior to phylogenetic analysis of recoded data.
format Online
Article
Text
id pubmed-4015178
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40151782014-05-23 Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics Vera-Ruiz, Victor A Lau, Kwok W Robinson, John Jermiin, Lars S BMC Bioinformatics Proceedings BACKGROUND: Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is lumpable, then the evolutionary process of the recoded groups is Markovian. RESULTS: We consider stationary, reversible, and homogeneous Markov processes on two taxa and compare three tests for lumpability: one using an ad hoc test statistic, which is based on an index that is evaluated using a bootstrap approximation of its distribution; one that is based on a test proposed specifically for Markov chains; and one using a likelihood-ratio test. We show that the likelihood-ratio test is more powerful than the index test, which is more powerful than that based on the Markov chain test statistic. We also show that for stationary processes on binary trees with more than two taxa, the tests can be applied to all pairs. Finally, we show that if the process is lumpable, then estimates obtained under the recoded model agree with estimates obtained under the original model, whereas, if the process is not lumpable, then these estimates can differ substantially. We apply the new likelihood-ratio test for lumpability to two primate data sets, one with a mitochondrial origin and one with a nuclear origin. CONCLUSIONS: Recoding may result in biased phylogenetic estimates because the original evolutionary process is not lumpable. Accordingly, testing for lumpability should be done prior to phylogenetic analysis of recoded data. BioMed Central 2014-01-31 /pmc/articles/PMC4015178/ /pubmed/24564837 http://dx.doi.org/10.1186/1471-2105-15-S2-S8 Text en Copyright © 2014 Vera-Ruiz et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Vera-Ruiz, Victor A
Lau, Kwok W
Robinson, John
Jermiin, Lars S
Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
title Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
title_full Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
title_fullStr Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
title_full_unstemmed Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
title_short Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
title_sort statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015178/
https://www.ncbi.nlm.nih.gov/pubmed/24564837
http://dx.doi.org/10.1186/1471-2105-15-S2-S8
work_keys_str_mv AT veraruizvictora statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics
AT laukwokw statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics
AT robinsonjohn statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics
AT jermiinlarss statisticalteststoidentifyappropriatetypesofnucleotidesequencerecodinginmolecularphylogenetics