Cargando…
Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference
Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger famili...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178227/ https://www.ncbi.nlm.nih.gov/pubmed/35642314 http://dx.doi.org/10.1093/molbev/msac112 |
_version_ | 1784723013483102208 |
---|---|
author | Smith, Megan L. Vanderpool, Dan Hahn, Matthew W. |
author_facet | Smith, Megan L. Vanderpool, Dan Hahn, Matthew W. |
author_sort | Smith, Megan L. |
collection | PubMed |
description | Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger families. Recent advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several methods for species tree inference are robust to the inclusion of paralogs and could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference by examining relationships among 26 primate species in detail and by analyzing five additional data sets. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data. We explore several species tree inference methods, finding that identical trees are returned across nearly all subsets of the data and methods for primates. The relationships among Platyrrhini remain contentious; however, the species tree inference method matters more than the subset of data used. Using data from larger gene families drastically increases the number of genes available and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression in primates. For the other data sets, topological inferences are consistent whether single-copy families or orthologs extracted using decomposition approaches are analyzed. Using larger gene families is a promising approach to include more data in phylogenomics without sacrificing accuracy, at least when high-quality genomes are available. |
format | Online Article Text |
id | pubmed-9178227 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-91782272022-06-09 Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference Smith, Megan L. Vanderpool, Dan Hahn, Matthew W. Mol Biol Evol Discoveries Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger families. Recent advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several methods for species tree inference are robust to the inclusion of paralogs and could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference by examining relationships among 26 primate species in detail and by analyzing five additional data sets. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data. We explore several species tree inference methods, finding that identical trees are returned across nearly all subsets of the data and methods for primates. The relationships among Platyrrhini remain contentious; however, the species tree inference method matters more than the subset of data used. Using data from larger gene families drastically increases the number of genes available and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression in primates. For the other data sets, topological inferences are consistent whether single-copy families or orthologs extracted using decomposition approaches are analyzed. Using larger gene families is a promising approach to include more data in phylogenomics without sacrificing accuracy, at least when high-quality genomes are available. Oxford University Press 2022-06-01 /pmc/articles/PMC9178227/ /pubmed/35642314 http://dx.doi.org/10.1093/molbev/msac112 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Discoveries Smith, Megan L. Vanderpool, Dan Hahn, Matthew W. Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference |
title | Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference |
title_full | Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference |
title_fullStr | Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference |
title_full_unstemmed | Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference |
title_short | Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference |
title_sort | using all gene families vastly expands data available for phylogenomic inference |
topic | Discoveries |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178227/ https://www.ncbi.nlm.nih.gov/pubmed/35642314 http://dx.doi.org/10.1093/molbev/msac112 |
work_keys_str_mv | AT smithmeganl usingallgenefamiliesvastlyexpandsdataavailableforphylogenomicinference AT vanderpooldan usingallgenefamiliesvastlyexpandsdataavailableforphylogenomicinference AT hahnmattheww usingallgenefamiliesvastlyexpandsdataavailableforphylogenomicinference |