Cargando…

Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference

Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger famili...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Megan L., Vanderpool, Dan, Hahn, Matthew W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178227/
https://www.ncbi.nlm.nih.gov/pubmed/35642314
http://dx.doi.org/10.1093/molbev/msac112
_version_ 1784723013483102208
author Smith, Megan L.
Vanderpool, Dan
Hahn, Matthew W.
author_facet Smith, Megan L.
Vanderpool, Dan
Hahn, Matthew W.
author_sort Smith, Megan L.
collection PubMed
description Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger families. Recent advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several methods for species tree inference are robust to the inclusion of paralogs and could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference by examining relationships among 26 primate species in detail and by analyzing five additional data sets. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data. We explore several species tree inference methods, finding that identical trees are returned across nearly all subsets of the data and methods for primates. The relationships among Platyrrhini remain contentious; however, the species tree inference method matters more than the subset of data used. Using data from larger gene families drastically increases the number of genes available and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression in primates. For the other data sets, topological inferences are consistent whether single-copy families or orthologs extracted using decomposition approaches are analyzed. Using larger gene families is a promising approach to include more data in phylogenomics without sacrificing accuracy, at least when high-quality genomes are available.
format Online
Article
Text
id pubmed-9178227
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91782272022-06-09 Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference Smith, Megan L. Vanderpool, Dan Hahn, Matthew W. Mol Biol Evol Discoveries Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger families. Recent advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several methods for species tree inference are robust to the inclusion of paralogs and could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference by examining relationships among 26 primate species in detail and by analyzing five additional data sets. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data. We explore several species tree inference methods, finding that identical trees are returned across nearly all subsets of the data and methods for primates. The relationships among Platyrrhini remain contentious; however, the species tree inference method matters more than the subset of data used. Using data from larger gene families drastically increases the number of genes available and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression in primates. For the other data sets, topological inferences are consistent whether single-copy families or orthologs extracted using decomposition approaches are analyzed. Using larger gene families is a promising approach to include more data in phylogenomics without sacrificing accuracy, at least when high-quality genomes are available. Oxford University Press 2022-06-01 /pmc/articles/PMC9178227/ /pubmed/35642314 http://dx.doi.org/10.1093/molbev/msac112 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Discoveries
Smith, Megan L.
Vanderpool, Dan
Hahn, Matthew W.
Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
title Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
title_full Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
title_fullStr Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
title_full_unstemmed Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
title_short Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
title_sort using all gene families vastly expands data available for phylogenomic inference
topic Discoveries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178227/
https://www.ncbi.nlm.nih.gov/pubmed/35642314
http://dx.doi.org/10.1093/molbev/msac112
work_keys_str_mv AT smithmeganl usingallgenefamiliesvastlyexpandsdataavailableforphylogenomicinference
AT vanderpooldan usingallgenefamiliesvastlyexpandsdataavailableforphylogenomicinference
AT hahnmattheww usingallgenefamiliesvastlyexpandsdataavailableforphylogenomicinference