Cargando…
The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
BACKGROUND: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of var...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833305/ https://www.ncbi.nlm.nih.gov/pubmed/31694538 http://dx.doi.org/10.1186/s12862-019-1534-9 |
_version_ | 1783466359704780800 |
---|---|
author | Du, Yan Wu, Shaoyuan Edwards, Scott V. Liu, Liang |
author_facet | Du, Yan Wu, Shaoyuan Edwards, Scott V. Liu, Liang |
author_sort | Du, Yan |
collection | PubMed |
description | BACKGROUND: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS: The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS: Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data. |
format | Online Article Text |
id | pubmed-6833305 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68333052019-11-08 The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life Du, Yan Wu, Shaoyuan Edwards, Scott V. Liu, Liang BMC Evol Biol Research Article BACKGROUND: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS: The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS: Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data. BioMed Central 2019-11-06 /pmc/articles/PMC6833305/ /pubmed/31694538 http://dx.doi.org/10.1186/s12862-019-1534-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Du, Yan Wu, Shaoyuan Edwards, Scott V. Liu, Liang The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
title | The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
title_full | The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
title_fullStr | The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
title_full_unstemmed | The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
title_short | The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
title_sort | effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833305/ https://www.ncbi.nlm.nih.gov/pubmed/31694538 http://dx.doi.org/10.1186/s12862-019-1534-9 |
work_keys_str_mv | AT duyan theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT wushaoyuan theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT edwardsscottv theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT liuliang theeffectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT duyan effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT wushaoyuan effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT edwardsscottv effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife AT liuliang effectofalignmentuncertaintysubstitutionmodelsandpriorsinbuildinganddatingthemammaltreeoflife |