Cargando…

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies

Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an i...

Descripción completa

Detalles Bibliográficos
Autores principales: Kapli, Paschalia, Kotari, Ioanna, Telford, Maximilian J, Goldman, Nick, Yang, Ziheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627555/
https://www.ncbi.nlm.nih.gov/pubmed/37366056
http://dx.doi.org/10.1093/sysbio/syad036
_version_ 1785131548834529280
author Kapli, Paschalia
Kotari, Ioanna
Telford, Maximilian J
Goldman, Nick
Yang, Ziheng
author_facet Kapli, Paschalia
Kotari, Ioanna
Telford, Maximilian J
Goldman, Nick
Yang, Ziheng
author_sort Kapli, Paschalia
collection PubMed
description Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.
format Online
Article
Text
id pubmed-10627555
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106275552023-11-07 DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies Kapli, Paschalia Kotari, Ioanna Telford, Maximilian J Goldman, Nick Yang, Ziheng Syst Biol Regular Manuscripts Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies. Oxford University Press 2023-06-27 /pmc/articles/PMC10627555/ /pubmed/37366056 http://dx.doi.org/10.1093/sysbio/syad036 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Regular Manuscripts
Kapli, Paschalia
Kotari, Ioanna
Telford, Maximilian J
Goldman, Nick
Yang, Ziheng
DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
title DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
title_full DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
title_fullStr DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
title_full_unstemmed DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
title_short DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
title_sort dna sequences are as useful as protein sequences for inferring deep phylogenies
topic Regular Manuscripts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627555/
https://www.ncbi.nlm.nih.gov/pubmed/37366056
http://dx.doi.org/10.1093/sysbio/syad036
work_keys_str_mv AT kaplipaschalia dnasequencesareasusefulasproteinsequencesforinferringdeepphylogenies
AT kotariioanna dnasequencesareasusefulasproteinsequencesforinferringdeepphylogenies
AT telfordmaximilianj dnasequencesareasusefulasproteinsequencesforinferringdeepphylogenies
AT goldmannick dnasequencesareasusefulasproteinsequencesforinferringdeepphylogenies
AT yangziheng dnasequencesareasusefulasproteinsequencesforinferringdeepphylogenies