Cargando…
So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data so...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957400/ https://www.ncbi.nlm.nih.gov/pubmed/29772020 http://dx.doi.org/10.1371/journal.pone.0197433 |
_version_ | 1783324056985010176 |
---|---|
author | Smith, Stephen A. Brown, Joseph W. Walker, Joseph F. |
author_facet | Smith, Stephen A. Brown, Joseph W. Walker, Joseph F. |
author_sort | Smith, Stephen A. |
collection | PubMed |
description | Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity. |
format | Online Article Text |
id | pubmed-5957400 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-59574002018-05-31 So many genes, so little time: A practical approach to divergence-time estimation in the genomic era Smith, Stephen A. Brown, Joseph W. Walker, Joseph F. PLoS One Research Article Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity. Public Library of Science 2018-05-17 /pmc/articles/PMC5957400/ /pubmed/29772020 http://dx.doi.org/10.1371/journal.pone.0197433 Text en © 2018 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Smith, Stephen A. Brown, Joseph W. Walker, Joseph F. So many genes, so little time: A practical approach to divergence-time estimation in the genomic era |
title | So many genes, so little time: A practical approach to divergence-time estimation in the genomic era |
title_full | So many genes, so little time: A practical approach to divergence-time estimation in the genomic era |
title_fullStr | So many genes, so little time: A practical approach to divergence-time estimation in the genomic era |
title_full_unstemmed | So many genes, so little time: A practical approach to divergence-time estimation in the genomic era |
title_short | So many genes, so little time: A practical approach to divergence-time estimation in the genomic era |
title_sort | so many genes, so little time: a practical approach to divergence-time estimation in the genomic era |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957400/ https://www.ncbi.nlm.nih.gov/pubmed/29772020 http://dx.doi.org/10.1371/journal.pone.0197433 |
work_keys_str_mv | AT smithstephena somanygenessolittletimeapracticalapproachtodivergencetimeestimationinthegenomicera AT brownjosephw somanygenessolittletimeapracticalapproachtodivergencetimeestimationinthegenomicera AT walkerjosephf somanygenessolittletimeapracticalapproachtodivergencetimeestimationinthegenomicera |