Cargando…

A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference

Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known abo...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Xing-Xing, Salichos, Leonidas, Rokas, Antonis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010910/
https://www.ncbi.nlm.nih.gov/pubmed/27492233
http://dx.doi.org/10.1093/gbe/evw179
_version_ 1782451752787771392
author Shen, Xing-Xing
Salichos, Leonidas
Rokas, Antonis
author_facet Shen, Xing-Xing
Salichos, Leonidas
Rokas, Antonis
author_sort Shen, Xing-Xing
collection PubMed
description Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
format Online
Article
Text
id pubmed-5010910
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50109102016-09-06 A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference Shen, Xing-Xing Salichos, Leonidas Rokas, Antonis Genome Biol Evol Research Article Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. Oxford University Press 2016-08-04 /pmc/articles/PMC5010910/ /pubmed/27492233 http://dx.doi.org/10.1093/gbe/evw179 Text en © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com (http://journals.permissions@oup.com)
spellingShingle Research Article
Shen, Xing-Xing
Salichos, Leonidas
Rokas, Antonis
A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
title A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
title_full A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
title_fullStr A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
title_full_unstemmed A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
title_short A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
title_sort genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010910/
https://www.ncbi.nlm.nih.gov/pubmed/27492233
http://dx.doi.org/10.1093/gbe/evw179
work_keys_str_mv AT shenxingxing agenomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference
AT salichosleonidas agenomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference
AT rokasantonis agenomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference
AT shenxingxing genomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference
AT salichosleonidas genomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference
AT rokasantonis genomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference