Cargando…

Robustness of birth-death and gain models for inferring evolutionary events

BACKGROUND: Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in pres...

Descripción completa

Detalles Bibliográficos
Autores principales: Stolzer, Maureen, Wasserman, Larry, Durand, Dannie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4239551/
https://www.ncbi.nlm.nih.gov/pubmed/25572914
http://dx.doi.org/10.1186/1471-2164-15-S6-S9
_version_ 1782345608646885376
author Stolzer, Maureen
Wasserman, Larry
Durand, Dannie
author_facet Stolzer, Maureen
Wasserman, Larry
Durand, Dannie
author_sort Stolzer, Maureen
collection PubMed
description BACKGROUND: Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in present-day species, the birth-death approach estimates the most likely rates required to explain the observed data and returns the expected ancestral character states and the history of character state changes. Achieving a balance between model complexity and generalizability is a fundamental challenge in the application of birth-death models. While more parameters promise greater accuracy and more biologically realistic models, increasing model complexity can lead to overfitting and a heavy computational cost. RESULTS: Here we present a systematic, empirical investigation of these tradeoffs, using protein domain families in six metazoan genomes as a case study. We compared models of increasing complexity, implemented in the Count program, with respect to model fit, robustness, and stability. In addition, we used a bootstrapping procedure to assess estimator variability. The results show that the most complex model, which allows for both branch-specific and family-specific rate variation, achieves the best fit, without overfitting. Variance remains low with increasing complexity, except for family-specific loss rates. This variance is reduced when the number of discrete rate categories is increased. Model choice is of greatest concern when different models lead to fundamentally different outcomes. To investigate the extent to which model choice influences biological interpretation, ancestral states and expected events were inferred under each model. Disturbingly, the different models not only resulted in quantitatively different histories, but predicted qualitatively different patterns of domain family turnover and genome expansion and reduction. CONCLUSIONS: The work presented here evaluates model choice for genomic birth-death models in a systematic way and presents the first use of bootstrapping to assess estimator variance in birth-death models. We find that a model incorporating both lineage and family rate variation yields more accurate estimators without sacrificing generality. Our results indicate that model choice can lead to fundamentally different evolutionary conclusions, emphasizing the importance of more biologically realistic and complex models.
format Online
Article
Text
id pubmed-4239551
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42395512014-11-25 Robustness of birth-death and gain models for inferring evolutionary events Stolzer, Maureen Wasserman, Larry Durand, Dannie BMC Genomics Research BACKGROUND: Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in present-day species, the birth-death approach estimates the most likely rates required to explain the observed data and returns the expected ancestral character states and the history of character state changes. Achieving a balance between model complexity and generalizability is a fundamental challenge in the application of birth-death models. While more parameters promise greater accuracy and more biologically realistic models, increasing model complexity can lead to overfitting and a heavy computational cost. RESULTS: Here we present a systematic, empirical investigation of these tradeoffs, using protein domain families in six metazoan genomes as a case study. We compared models of increasing complexity, implemented in the Count program, with respect to model fit, robustness, and stability. In addition, we used a bootstrapping procedure to assess estimator variability. The results show that the most complex model, which allows for both branch-specific and family-specific rate variation, achieves the best fit, without overfitting. Variance remains low with increasing complexity, except for family-specific loss rates. This variance is reduced when the number of discrete rate categories is increased. Model choice is of greatest concern when different models lead to fundamentally different outcomes. To investigate the extent to which model choice influences biological interpretation, ancestral states and expected events were inferred under each model. Disturbingly, the different models not only resulted in quantitatively different histories, but predicted qualitatively different patterns of domain family turnover and genome expansion and reduction. CONCLUSIONS: The work presented here evaluates model choice for genomic birth-death models in a systematic way and presents the first use of bootstrapping to assess estimator variance in birth-death models. We find that a model incorporating both lineage and family rate variation yields more accurate estimators without sacrificing generality. Our results indicate that model choice can lead to fundamentally different evolutionary conclusions, emphasizing the importance of more biologically realistic and complex models. BioMed Central 2014-10-17 /pmc/articles/PMC4239551/ /pubmed/25572914 http://dx.doi.org/10.1186/1471-2164-15-S6-S9 Text en Copyright © 2014 Stolzer et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Stolzer, Maureen
Wasserman, Larry
Durand, Dannie
Robustness of birth-death and gain models for inferring evolutionary events
title Robustness of birth-death and gain models for inferring evolutionary events
title_full Robustness of birth-death and gain models for inferring evolutionary events
title_fullStr Robustness of birth-death and gain models for inferring evolutionary events
title_full_unstemmed Robustness of birth-death and gain models for inferring evolutionary events
title_short Robustness of birth-death and gain models for inferring evolutionary events
title_sort robustness of birth-death and gain models for inferring evolutionary events
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4239551/
https://www.ncbi.nlm.nih.gov/pubmed/25572914
http://dx.doi.org/10.1186/1471-2164-15-S6-S9
work_keys_str_mv AT stolzermaureen robustnessofbirthdeathandgainmodelsforinferringevolutionaryevents
AT wassermanlarry robustnessofbirthdeathandgainmodelsforinferringevolutionaryevents
AT duranddannie robustnessofbirthdeathandgainmodelsforinferringevolutionaryevents