Cargando…
Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the r...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382905/ https://www.ncbi.nlm.nih.gov/pubmed/33983409 http://dx.doi.org/10.1093/molbev/msab151 |
_version_ | 1783741629609279488 |
---|---|
author | Mongiardino Koch, Nicolás |
author_facet | Mongiardino Koch, Nicolás |
author_sort | Mongiardino Koch, Nicolás |
collection | PubMed |
description | Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias. |
format | Online Article Text |
id | pubmed-8382905 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83829052021-08-25 Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci Mongiardino Koch, Nicolás Mol Biol Evol Methods Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias. Oxford University Press 2021-05-13 /pmc/articles/PMC8382905/ /pubmed/33983409 http://dx.doi.org/10.1093/molbev/msab151 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Mongiardino Koch, Nicolás Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci |
title | Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci |
title_full | Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci |
title_fullStr | Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci |
title_full_unstemmed | Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci |
title_short | Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci |
title_sort | phylogenomic subsampling and the search for phylogenetically reliable loci |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382905/ https://www.ncbi.nlm.nih.gov/pubmed/33983409 http://dx.doi.org/10.1093/molbev/msab151 |
work_keys_str_mv | AT mongiardinokochnicolas phylogenomicsubsamplingandthesearchforphylogeneticallyreliableloci |