Cargando…

Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci

Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the r...

Descripción completa

Detalles Bibliográficos
Autor principal: Mongiardino Koch, Nicolás
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382905/
https://www.ncbi.nlm.nih.gov/pubmed/33983409
http://dx.doi.org/10.1093/molbev/msab151
_version_ 1783741629609279488
author Mongiardino Koch, Nicolás
author_facet Mongiardino Koch, Nicolás
author_sort Mongiardino Koch, Nicolás
collection PubMed
description Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
format Online
Article
Text
id pubmed-8382905
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83829052021-08-25 Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci Mongiardino Koch, Nicolás Mol Biol Evol Methods Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias. Oxford University Press 2021-05-13 /pmc/articles/PMC8382905/ /pubmed/33983409 http://dx.doi.org/10.1093/molbev/msab151 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods
Mongiardino Koch, Nicolás
Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
title Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
title_full Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
title_fullStr Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
title_full_unstemmed Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
title_short Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
title_sort phylogenomic subsampling and the search for phylogenetically reliable loci
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382905/
https://www.ncbi.nlm.nih.gov/pubmed/33983409
http://dx.doi.org/10.1093/molbev/msab151
work_keys_str_mv AT mongiardinokochnicolas phylogenomicsubsamplingandthesearchforphylogeneticallyreliableloci