Cargando…

An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets

Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index ([Formula: see text]), a means to assess shared evolution between gene families using a weighted orthog...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangel, Luiz Thibério, Soucy, Shannon M, Setubal, João C, Gogarten, Johann Peter, Fournier, Gregory P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8483891/
https://www.ncbi.nlm.nih.gov/pubmed/34390574
http://dx.doi.org/10.1093/gbe/evab187
_version_ 1784577211311849472
author Rangel, Luiz Thibério
Soucy, Shannon M
Setubal, João C
Gogarten, Johann Peter
Fournier, Gregory P
author_facet Rangel, Luiz Thibério
Soucy, Shannon M
Setubal, João C
Gogarten, Johann Peter
Fournier, Gregory P
author_sort Rangel, Luiz Thibério
collection PubMed
description Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index ([Formula: see text]), a means to assess shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to evolutionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, [Formula: see text] enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, and yielding the least sum of squared residuals. Analyses of simulated gene family data sets show that [Formula: see text] ’s accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying [Formula: see text] to an empirical data set of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, which occur as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched toward specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of interphyla horizontal gene transfer. An [Formula: see text] implementation is available at https://github.com/lthiberiol/evolSimIndex.
format Online
Article
Text
id pubmed-8483891
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84838912021-10-01 An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets Rangel, Luiz Thibério Soucy, Shannon M Setubal, João C Gogarten, Johann Peter Fournier, Gregory P Genome Biol Evol Research Article Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index ([Formula: see text]), a means to assess shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to evolutionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, [Formula: see text] enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, and yielding the least sum of squared residuals. Analyses of simulated gene family data sets show that [Formula: see text] ’s accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying [Formula: see text] to an empirical data set of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, which occur as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched toward specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of interphyla horizontal gene transfer. An [Formula: see text] implementation is available at https://github.com/lthiberiol/evolSimIndex. Oxford University Press 2021-08-13 /pmc/articles/PMC8483891/ /pubmed/34390574 http://dx.doi.org/10.1093/gbe/evab187 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Rangel, Luiz Thibério
Soucy, Shannon M
Setubal, João C
Gogarten, Johann Peter
Fournier, Gregory P
An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
title An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
title_full An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
title_fullStr An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
title_full_unstemmed An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
title_short An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets
title_sort efficient, nonphylogenetic method for detecting genes sharing evolutionary signals in phylogenomic data sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8483891/
https://www.ncbi.nlm.nih.gov/pubmed/34390574
http://dx.doi.org/10.1093/gbe/evab187
work_keys_str_mv AT rangelluizthiberio anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT soucyshannonm anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT setubaljoaoc anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT gogartenjohannpeter anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT fourniergregoryp anefficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT rangelluizthiberio efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT soucyshannonm efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT setubaljoaoc efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT gogartenjohannpeter efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets
AT fourniergregoryp efficientnonphylogeneticmethodfordetectinggenessharingevolutionarysignalsinphylogenomicdatasets