Cargando…
Robustness of birth-death and gain models for inferring evolutionary events
BACKGROUND: Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in pres...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4239551/ https://www.ncbi.nlm.nih.gov/pubmed/25572914 http://dx.doi.org/10.1186/1471-2164-15-S6-S9 |
_version_ | 1782345608646885376 |
---|---|
author | Stolzer, Maureen Wasserman, Larry Durand, Dannie |
author_facet | Stolzer, Maureen Wasserman, Larry Durand, Dannie |
author_sort | Stolzer, Maureen |
collection | PubMed |
description | BACKGROUND: Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in present-day species, the birth-death approach estimates the most likely rates required to explain the observed data and returns the expected ancestral character states and the history of character state changes. Achieving a balance between model complexity and generalizability is a fundamental challenge in the application of birth-death models. While more parameters promise greater accuracy and more biologically realistic models, increasing model complexity can lead to overfitting and a heavy computational cost. RESULTS: Here we present a systematic, empirical investigation of these tradeoffs, using protein domain families in six metazoan genomes as a case study. We compared models of increasing complexity, implemented in the Count program, with respect to model fit, robustness, and stability. In addition, we used a bootstrapping procedure to assess estimator variability. The results show that the most complex model, which allows for both branch-specific and family-specific rate variation, achieves the best fit, without overfitting. Variance remains low with increasing complexity, except for family-specific loss rates. This variance is reduced when the number of discrete rate categories is increased. Model choice is of greatest concern when different models lead to fundamentally different outcomes. To investigate the extent to which model choice influences biological interpretation, ancestral states and expected events were inferred under each model. Disturbingly, the different models not only resulted in quantitatively different histories, but predicted qualitatively different patterns of domain family turnover and genome expansion and reduction. CONCLUSIONS: The work presented here evaluates model choice for genomic birth-death models in a systematic way and presents the first use of bootstrapping to assess estimator variance in birth-death models. We find that a model incorporating both lineage and family rate variation yields more accurate estimators without sacrificing generality. Our results indicate that model choice can lead to fundamentally different evolutionary conclusions, emphasizing the importance of more biologically realistic and complex models. |
format | Online Article Text |
id | pubmed-4239551 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42395512014-11-25 Robustness of birth-death and gain models for inferring evolutionary events Stolzer, Maureen Wasserman, Larry Durand, Dannie BMC Genomics Research BACKGROUND: Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in present-day species, the birth-death approach estimates the most likely rates required to explain the observed data and returns the expected ancestral character states and the history of character state changes. Achieving a balance between model complexity and generalizability is a fundamental challenge in the application of birth-death models. While more parameters promise greater accuracy and more biologically realistic models, increasing model complexity can lead to overfitting and a heavy computational cost. RESULTS: Here we present a systematic, empirical investigation of these tradeoffs, using protein domain families in six metazoan genomes as a case study. We compared models of increasing complexity, implemented in the Count program, with respect to model fit, robustness, and stability. In addition, we used a bootstrapping procedure to assess estimator variability. The results show that the most complex model, which allows for both branch-specific and family-specific rate variation, achieves the best fit, without overfitting. Variance remains low with increasing complexity, except for family-specific loss rates. This variance is reduced when the number of discrete rate categories is increased. Model choice is of greatest concern when different models lead to fundamentally different outcomes. To investigate the extent to which model choice influences biological interpretation, ancestral states and expected events were inferred under each model. Disturbingly, the different models not only resulted in quantitatively different histories, but predicted qualitatively different patterns of domain family turnover and genome expansion and reduction. CONCLUSIONS: The work presented here evaluates model choice for genomic birth-death models in a systematic way and presents the first use of bootstrapping to assess estimator variance in birth-death models. We find that a model incorporating both lineage and family rate variation yields more accurate estimators without sacrificing generality. Our results indicate that model choice can lead to fundamentally different evolutionary conclusions, emphasizing the importance of more biologically realistic and complex models. BioMed Central 2014-10-17 /pmc/articles/PMC4239551/ /pubmed/25572914 http://dx.doi.org/10.1186/1471-2164-15-S6-S9 Text en Copyright © 2014 Stolzer et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Stolzer, Maureen Wasserman, Larry Durand, Dannie Robustness of birth-death and gain models for inferring evolutionary events |
title | Robustness of birth-death and gain models for inferring evolutionary events |
title_full | Robustness of birth-death and gain models for inferring evolutionary events |
title_fullStr | Robustness of birth-death and gain models for inferring evolutionary events |
title_full_unstemmed | Robustness of birth-death and gain models for inferring evolutionary events |
title_short | Robustness of birth-death and gain models for inferring evolutionary events |
title_sort | robustness of birth-death and gain models for inferring evolutionary events |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4239551/ https://www.ncbi.nlm.nih.gov/pubmed/25572914 http://dx.doi.org/10.1186/1471-2164-15-S6-S9 |
work_keys_str_mv | AT stolzermaureen robustnessofbirthdeathandgainmodelsforinferringevolutionaryevents AT wassermanlarry robustnessofbirthdeathandgainmodelsforinferringevolutionaryevents AT duranddannie robustnessofbirthdeathandgainmodelsforinferringevolutionaryevents |