Cargando…
A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known abo...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010910/ https://www.ncbi.nlm.nih.gov/pubmed/27492233 http://dx.doi.org/10.1093/gbe/evw179 |
_version_ | 1782451752787771392 |
---|---|
author | Shen, Xing-Xing Salichos, Leonidas Rokas, Antonis |
author_facet | Shen, Xing-Xing Salichos, Leonidas Rokas, Antonis |
author_sort | Shen, Xing-Xing |
collection | PubMed |
description | Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. |
format | Online Article Text |
id | pubmed-5010910 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-50109102016-09-06 A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference Shen, Xing-Xing Salichos, Leonidas Rokas, Antonis Genome Biol Evol Research Article Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. Oxford University Press 2016-08-04 /pmc/articles/PMC5010910/ /pubmed/27492233 http://dx.doi.org/10.1093/gbe/evw179 Text en © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com (http://journals.permissions@oup.com) |
spellingShingle | Research Article Shen, Xing-Xing Salichos, Leonidas Rokas, Antonis A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference |
title | A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference |
title_full | A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference |
title_fullStr | A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference |
title_full_unstemmed | A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference |
title_short | A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference |
title_sort | genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010910/ https://www.ncbi.nlm.nih.gov/pubmed/27492233 http://dx.doi.org/10.1093/gbe/evw179 |
work_keys_str_mv | AT shenxingxing agenomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference AT salichosleonidas agenomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference AT rokasantonis agenomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference AT shenxingxing genomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference AT salichosleonidas genomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference AT rokasantonis genomescaleinvestigationofhowsequencefunctionandtreebasedgenepropertiesinfluencephylogeneticinference |