Cargando…

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data so...

Descripción completa

Detalles Bibliográficos
Autores principales:	Smith, Stephen A., Brown, Joseph W., Walker, Joseph F.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957400/ https://www.ncbi.nlm.nih.gov/pubmed/29772020 http://dx.doi.org/10.1371/journal.pone.0197433

_version_	1783324056985010176
author	Smith, Stephen A. Brown, Joseph W. Walker, Joseph F.
author_facet	Smith, Stephen A. Brown, Joseph W. Walker, Joseph F.
author_sort	Smith, Stephen A.
collection	PubMed
description	Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.
format	Online Article Text
id	pubmed-5957400
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-59574002018-05-31 So many genes, so little time: A practical approach to divergence-time estimation in the genomic era Smith, Stephen A. Brown, Joseph W. Walker, Joseph F. PLoS One Research Article Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity. Public Library of Science 2018-05-17 /pmc/articles/PMC5957400/ /pubmed/29772020 http://dx.doi.org/10.1371/journal.pone.0197433 Text en © 2018 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Smith, Stephen A. Brown, Joseph W. Walker, Joseph F. So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
title	So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
title_full	So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
title_fullStr	So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
title_full_unstemmed	So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
title_short	So many genes, so little time: A practical approach to divergence-time estimation in the genomic era
title_sort	so many genes, so little time: a practical approach to divergence-time estimation in the genomic era
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957400/ https://www.ncbi.nlm.nih.gov/pubmed/29772020 http://dx.doi.org/10.1371/journal.pone.0197433
work_keys_str_mv	AT smithstephena somanygenessolittletimeapracticalapproachtodivergencetimeestimationinthegenomicera AT brownjosephw somanygenessolittletimeapracticalapproachtodivergencetimeestimationinthegenomicera AT walkerjosephf somanygenessolittletimeapracticalapproachtodivergencetimeestimationinthegenomicera

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

Ejemplares similares