Cargando…

Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data

The historical signal in nucleotide sequences becomes eroded over time by substitutions occurring repeatedly at the same sites. This phenomenon, known as substitution saturation, is recognized as one of the primary obstacles to deep-time phylogenetic inference using genome-scale data sets. We presen...

Descripción completa

Detalles Bibliográficos
Autores principales: Duchêne, David A, Mather, Niklas, Van Der Wal, Cara, Ho, Simon Y W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9016599/
https://www.ncbi.nlm.nih.gov/pubmed/34508605
http://dx.doi.org/10.1093/sysbio/syab075
_version_ 1784688562650742784
author Duchêne, David A
Mather, Niklas
Van Der Wal, Cara
Ho, Simon Y W
author_facet Duchêne, David A
Mather, Niklas
Van Der Wal, Cara
Ho, Simon Y W
author_sort Duchêne, David A
collection PubMed
description The historical signal in nucleotide sequences becomes eroded over time by substitutions occurring repeatedly at the same sites. This phenomenon, known as substitution saturation, is recognized as one of the primary obstacles to deep-time phylogenetic inference using genome-scale data sets. We present a new test of substitution saturation and demonstrate its performance in simulated and empirical data. For some of the 36 empirical phylogenomic data sets that we examined, we detect substitution saturation in around 50% of loci. We found that saturation tends to be flagged as problematic in loci with highly discordant phylogenetic signals across sites. Within each data set, the loci with smaller numbers of informative sites are more likely to be flagged as containing problematic levels of saturation. The entropy saturation test proposed here is sensitive to high evolutionary rates relative to the evolutionary timeframe, while also being sensitive to several factors known to mislead phylogenetic inference, including short internal branches relative to external branches, short nucleotide sequences, and tree imbalance. Our study demonstrates that excluding loci with substitution saturation can be an effective means of mitigating the negative impact of multiple substitutions on phylogenetic inferences. [Phylogenetic model performance; phylogenomics; substitution model; substitution saturation; test statistics.]
format Online
Article
Text
id pubmed-9016599
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-90165992022-04-20 Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data Duchêne, David A Mather, Niklas Van Der Wal, Cara Ho, Simon Y W Syst Biol Regular Articles The historical signal in nucleotide sequences becomes eroded over time by substitutions occurring repeatedly at the same sites. This phenomenon, known as substitution saturation, is recognized as one of the primary obstacles to deep-time phylogenetic inference using genome-scale data sets. We present a new test of substitution saturation and demonstrate its performance in simulated and empirical data. For some of the 36 empirical phylogenomic data sets that we examined, we detect substitution saturation in around 50% of loci. We found that saturation tends to be flagged as problematic in loci with highly discordant phylogenetic signals across sites. Within each data set, the loci with smaller numbers of informative sites are more likely to be flagged as containing problematic levels of saturation. The entropy saturation test proposed here is sensitive to high evolutionary rates relative to the evolutionary timeframe, while also being sensitive to several factors known to mislead phylogenetic inference, including short internal branches relative to external branches, short nucleotide sequences, and tree imbalance. Our study demonstrates that excluding loci with substitution saturation can be an effective means of mitigating the negative impact of multiple substitutions on phylogenetic inferences. [Phylogenetic model performance; phylogenomics; substitution model; substitution saturation; test statistics.] Oxford University Press 2021-09-11 /pmc/articles/PMC9016599/ /pubmed/34508605 http://dx.doi.org/10.1093/sysbio/syab075 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Duchêne, David A
Mather, Niklas
Van Der Wal, Cara
Ho, Simon Y W
Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data
title Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data
title_full Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data
title_fullStr Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data
title_full_unstemmed Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data
title_short Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data
title_sort excluding loci with substitution saturation improves inferences from phylogenomic data
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9016599/
https://www.ncbi.nlm.nih.gov/pubmed/34508605
http://dx.doi.org/10.1093/sysbio/syab075
work_keys_str_mv AT duchenedavida excludinglociwithsubstitutionsaturationimprovesinferencesfromphylogenomicdata
AT matherniklas excludinglociwithsubstitutionsaturationimprovesinferencesfromphylogenomicdata
AT vanderwalcara excludinglociwithsubstitutionsaturationimprovesinferencesfromphylogenomicdata
AT hosimonyw excludinglociwithsubstitutionsaturationimprovesinferencesfromphylogenomicdata