Cargando…
An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index ([Formula: see text]), a means to assess shared evolution between gene families using a weighted orthog...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8483891/ https://www.ncbi.nlm.nih.gov/pubmed/34390574 http://dx.doi.org/10.1093/gbe/evab187 |
_version_ | 1784577211311849472 |
---|---|
author | Rangel, Luiz Thibério Soucy, Shannon M Setubal, João C Gogarten, Johann Peter Fournier, Gregory P |
author_facet | Rangel, Luiz Thibério Soucy, Shannon M Setubal, João C Gogarten, Johann Peter Fournier, Gregory P |
author_sort | Rangel, Luiz Thibério |
collection | PubMed |
description | Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index ([Formula: see text]), a means to assess shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to evolutionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, [Formula: see text] enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, and yielding the least sum of squared residuals. Analyses of simulated gene family data sets show that [Formula: see text] ’s accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying [Formula: see text] to an empirical data set of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, which occur as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched toward specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of interphyla horizontal gene transfer. An [Formula: see text] implementation is available at https://github.com/lthiberiol/evolSimIndex. |
format | Online Article Text |
id | pubmed-8483891 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84838912021-10-01 An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets Rangel, Luiz Thibério Soucy, Shannon M Setubal, João C Gogarten, Johann Peter Fournier, Gregory P Genome Biol Evol Research Article Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index ([Formula: see text]), a means to assess shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to evolutionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, [Formula: see text] enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, and yielding the least sum of squared residuals. Analyses of simulated gene family data sets show that [Formula: see text] ’s accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying [Formula: see text] to an empirical data set of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, which occur as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched toward specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of interphyla horizontal gene transfer. An [Formula: see text] implementation is available at https://github.com/lthiberiol/evolSimIndex. Oxford University Press 2021-08-13 /pmc/articles/PMC8483891/ /pubmed/34390574 http://dx.doi.org/10.1093/gbe/evab187 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research Article Rangel, Luiz Thibério Soucy, Shannon M Setubal, João C Gogarten, Johann Peter Fournier, Gregory P An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets |
title | An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets |
title_full | An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets |
title_fullStr | An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets |
title_full_unstemmed | An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets |
title_short | An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets |
title_sort | efficient, nonphylogenetic method for detecting genes sharing evolutionary signals in phylogenomic data sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8483891/ https://www.ncbi.nlm.nih.gov/pubmed/34390574 http://dx.doi.org/10.1093/gbe/evab187 |
work_keys_str_mv | AT rangelluizthiberio anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT soucyshannonm anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT setubaljoaoc anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT gogartenjohannpeter anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT fourniergregoryp anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT rangelluizthiberio efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT soucyshannonm efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT setubaljoaoc efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT gogartenjohannpeter efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets AT fourniergregoryp efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets |