Cargando…

Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation ([Formula: see text]...

Descripción completa

Detalles Bibliográficos
Autores principales: Chauve, Cedric, Ponty, Yann, Wallner, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7052048/
https://www.ncbi.nlm.nih.gov/pubmed/32060618
http://dx.doi.org/10.1007/s00285-019-01465-x
_version_ 1783502786097315840
author Chauve, Cedric
Ponty, Yann
Wallner, Michael
author_facet Chauve, Cedric
Ponty, Yann
Wallner, Michael
author_sort Chauve, Cedric
collection PubMed
description Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation ([Formula: see text] ), gene duplication ([Formula: see text] ), gene loss ([Formula: see text] ), and horizontal gene transfer ([Formula: see text] ). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text] -model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text] -model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text] -model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.
format Online
Article
Text
id pubmed-7052048
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-70520482020-03-16 Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models Chauve, Cedric Ponty, Yann Wallner, Michael J Math Biol Article Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation ([Formula: see text] ), gene duplication ([Formula: see text] ), gene loss ([Formula: see text] ), and horizontal gene transfer ([Formula: see text] ). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text] -model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text] -model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text] -model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations. Springer Berlin Heidelberg 2020-02-15 2020 /pmc/articles/PMC7052048/ /pubmed/32060618 http://dx.doi.org/10.1007/s00285-019-01465-x Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Chauve, Cedric
Ponty, Yann
Wallner, Michael
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
title Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
title_full Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
title_fullStr Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
title_full_unstemmed Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
title_short Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
title_sort counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7052048/
https://www.ncbi.nlm.nih.gov/pubmed/32060618
http://dx.doi.org/10.1007/s00285-019-01465-x
work_keys_str_mv AT chauvecedric countingandsamplinggenefamilyevolutionaryhistoriesintheduplicationlossandduplicationlosstransfermodels
AT pontyyann countingandsamplinggenefamilyevolutionaryhistoriesintheduplicationlossandduplicationlosstransfermodels
AT wallnermichael countingandsamplinggenefamilyevolutionaryhistoriesintheduplicationlossandduplicationlosstransfermodels