Cargando…

Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences

BACKGROUND: Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimens...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baele, Guy, Van de Peer, Yves, Vansteelandt, Stijn
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2695821/ https://www.ncbi.nlm.nih.gov/pubmed/19405957 http://dx.doi.org/10.1186/1471-2148-9-87

_version_	1782168235947327488
author	Baele, Guy Van de Peer, Yves Vansteelandt, Stijn
author_facet	Baele, Guy Van de Peer, Yves Vansteelandt, Stijn
author_sort	Baele, Guy
collection	PubMed
description	BACKGROUND: Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimensionality is only justified when accompanied with careful model-building strategies that guard against overfitting. An increased dimensionality leads to increases in numerical computations of the models, increased convergence times in Bayesian Markov chain Monte Carlo algorithms and even more tedious Bayes Factor calculations. RESULTS: We have developed two model-search algorithms which reduce the number of Bayes Factor calculations by clustering posterior densities to decide on the equality of substitution behavior in different contexts. The selected model's fit is evaluated using a Bayes Factor, which we calculate via model-switch thermodynamic integration. To reduce computation time and to increase the precision of this integration, we propose to split the calculations over different computers and to appropriately calibrate the individual runs. Using the proposed strategies, we find, in a dataset of primate Ancestral Repeats, that careful modeling of context-dependent evolution may increase model fit considerably and that the combination of a context-dependent model with the assumption of varying rates across sites offers even larger improvements in terms of model fit. Using a smaller nuclear SSU rRNA dataset, we show that context-dependence may only become detectable upon applying model-building strategies. CONCLUSION: While context-dependent evolutionary models can increase the model fit over traditional independent evolutionary models, such complex models will often contain too many parameters. Justification for the added parameters is thus required so that only those parameters that model evolutionary processes previously unaccounted for are added to the evolutionary model. To obtain an optimal balance between the number of parameters in a context-dependent model and the performance in terms of model fit, we have designed two parameter-reduction strategies and we have shown that model fit can be greatly improved by reducing the number of parameters in a context-dependent evolutionary model.
format	Text
id	pubmed-2695821
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26958212009-06-13 Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences Baele, Guy Van de Peer, Yves Vansteelandt, Stijn BMC Evol Biol Research Article BACKGROUND: Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimensionality is only justified when accompanied with careful model-building strategies that guard against overfitting. An increased dimensionality leads to increases in numerical computations of the models, increased convergence times in Bayesian Markov chain Monte Carlo algorithms and even more tedious Bayes Factor calculations. RESULTS: We have developed two model-search algorithms which reduce the number of Bayes Factor calculations by clustering posterior densities to decide on the equality of substitution behavior in different contexts. The selected model's fit is evaluated using a Bayes Factor, which we calculate via model-switch thermodynamic integration. To reduce computation time and to increase the precision of this integration, we propose to split the calculations over different computers and to appropriately calibrate the individual runs. Using the proposed strategies, we find, in a dataset of primate Ancestral Repeats, that careful modeling of context-dependent evolution may increase model fit considerably and that the combination of a context-dependent model with the assumption of varying rates across sites offers even larger improvements in terms of model fit. Using a smaller nuclear SSU rRNA dataset, we show that context-dependence may only become detectable upon applying model-building strategies. CONCLUSION: While context-dependent evolutionary models can increase the model fit over traditional independent evolutionary models, such complex models will often contain too many parameters. Justification for the added parameters is thus required so that only those parameters that model evolutionary processes previously unaccounted for are added to the evolutionary model. To obtain an optimal balance between the number of parameters in a context-dependent model and the performance in terms of model fit, we have designed two parameter-reduction strategies and we have shown that model fit can be greatly improved by reducing the number of parameters in a context-dependent evolutionary model. BioMed Central 2009-04-30 /pmc/articles/PMC2695821/ /pubmed/19405957 http://dx.doi.org/10.1186/1471-2148-9-87 Text en Copyright © 2009 Baele et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Baele, Guy Van de Peer, Yves Vansteelandt, Stijn Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
title	Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
title_full	Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
title_fullStr	Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
title_full_unstemmed	Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
title_short	Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
title_sort	efficient context-dependent model building based on clustering posterior distributions for non-coding sequences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2695821/ https://www.ncbi.nlm.nih.gov/pubmed/19405957 http://dx.doi.org/10.1186/1471-2148-9-87
work_keys_str_mv	AT baeleguy efficientcontextdependentmodelbuildingbasedonclusteringposteriordistributionsfornoncodingsequences AT vandepeeryves efficientcontextdependentmodelbuildingbasedonclusteringposteriordistributionsfornoncodingsequences AT vansteelandtstijn efficientcontextdependentmodelbuildingbasedonclusteringposteriordistributionsfornoncodingsequences

Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences

Ejemplares similares