Cargando…

The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life

BACKGROUND: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of var...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Yan, Wu, Shaoyuan, Edwards, Scott V., Liu, Liang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833305/
https://www.ncbi.nlm.nih.gov/pubmed/31694538
http://dx.doi.org/10.1186/s12862-019-1534-9
_version_ 1783466359704780800
author Du, Yan
Wu, Shaoyuan
Edwards, Scott V.
Liu, Liang
author_facet Du, Yan
Wu, Shaoyuan
Edwards, Scott V.
Liu, Liang
author_sort Du, Yan
collection PubMed
description BACKGROUND: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS: The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS: Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.
format Online
Article
Text
id pubmed-6833305
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68333052019-11-08 The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life Du, Yan Wu, Shaoyuan Edwards, Scott V. Liu, Liang BMC Evol Biol Research Article BACKGROUND: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS: The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS: Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data. BioMed Central 2019-11-06 /pmc/articles/PMC6833305/ /pubmed/31694538 http://dx.doi.org/10.1186/s12862-019-1534-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Du, Yan
Wu, Shaoyuan
Edwards, Scott V.
Liu, Liang
The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
title The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
title_full The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
title_fullStr The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
title_full_unstemmed The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
title_short The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
title_sort effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833305/
https://www.ncbi.nlm.nih.gov/pubmed/31694538
http://dx.doi.org/10.1186/s12862-019-1534-9
work_keys_str_mv AT duyan theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT wushaoyuan theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT edwardsscottv theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT liuliang theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT duyan effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT wushaoyuan effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT edwardsscottv effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife
AT liuliang effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife