Cargando…

Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences

BACKGROUND: Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different o...

Descripción completa

Detalles Bibliográficos
Autores principales: Baele, Guy, Van de Peer, Yves, Vansteelandt, Stijn
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2928787/
https://www.ncbi.nlm.nih.gov/pubmed/20698960
http://dx.doi.org/10.1186/1471-2148-10-244
_version_ 1782185886852579328
author Baele, Guy
Van de Peer, Yves
Vansteelandt, Stijn
author_facet Baele, Guy
Van de Peer, Yves
Vansteelandt, Stijn
author_sort Baele, Guy
collection PubMed
description BACKGROUND: Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the context-dependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence. RESULTS: We find strong support, in terms of Bayes Factors, for using a second-order Markov chain at the ancestral root sequence along with a context-dependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a first-order Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single context-independent set of independent model frequencies as presented in previous work, yields a further drastic increase in model fit. We show that the substitution rates associated with the CpG-methylation-deamination process can be modelled through context-dependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuous-time approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences. CONCLUSIONS: We show that the combination of a dependency scheme at the ancestral root sequence and a context-dependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate context-dependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging.
format Text
id pubmed-2928787
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29287872010-08-27 Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences Baele, Guy Van de Peer, Yves Vansteelandt, Stijn BMC Evol Biol Research Article BACKGROUND: Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the context-dependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence. RESULTS: We find strong support, in terms of Bayes Factors, for using a second-order Markov chain at the ancestral root sequence along with a context-dependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a first-order Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single context-independent set of independent model frequencies as presented in previous work, yields a further drastic increase in model fit. We show that the substitution rates associated with the CpG-methylation-deamination process can be modelled through context-dependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuous-time approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences. CONCLUSIONS: We show that the combination of a dependency scheme at the ancestral root sequence and a context-dependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate context-dependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging. BioMed Central 2010-08-10 /pmc/articles/PMC2928787/ /pubmed/20698960 http://dx.doi.org/10.1186/1471-2148-10-244 Text en Copyright ©2010 Baele et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Baele, Guy
Van de Peer, Yves
Vansteelandt, Stijn
Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
title Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
title_full Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
title_fullStr Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
title_full_unstemmed Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
title_short Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
title_sort modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2928787/
https://www.ncbi.nlm.nih.gov/pubmed/20698960
http://dx.doi.org/10.1186/1471-2148-10-244
work_keys_str_mv AT baeleguy modellingtheancestralsequencedistributionandmodelfrequenciesincontextdependentmodelsforprimatenoncodingsequences
AT vandepeeryves modellingtheancestralsequencedistributionandmodelfrequenciesincontextdependentmodelsforprimatenoncodingsequences
AT vansteelandtstijn modellingtheancestralsequencedistributionandmodelfrequenciesincontextdependentmodelsforprimatenoncodingsequences