Cargando…

Toward Reducing Phylostratigraphic Errors and Biases

Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegli...

Descripción completa

Detalles Bibliográficos
Autores principales: Moyers, Bryan A, Zhang, Jianzhi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105108/
https://www.ncbi.nlm.nih.gov/pubmed/30060201
http://dx.doi.org/10.1093/gbe/evy161
_version_ 1783349602371502080
author Moyers, Bryan A
Zhang, Jianzhi
author_facet Moyers, Bryan A
Zhang, Jianzhi
author_sort Moyers, Bryan A
collection PubMed
description Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.
format Online
Article
Text
id pubmed-6105108
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61051082018-08-27 Toward Reducing Phylostratigraphic Errors and Biases Moyers, Bryan A Zhang, Jianzhi Genome Biol Evol Research Article Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings. Oxford University Press 2018-07-30 /pmc/articles/PMC6105108/ /pubmed/30060201 http://dx.doi.org/10.1093/gbe/evy161 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Moyers, Bryan A
Zhang, Jianzhi
Toward Reducing Phylostratigraphic Errors and Biases
title Toward Reducing Phylostratigraphic Errors and Biases
title_full Toward Reducing Phylostratigraphic Errors and Biases
title_fullStr Toward Reducing Phylostratigraphic Errors and Biases
title_full_unstemmed Toward Reducing Phylostratigraphic Errors and Biases
title_short Toward Reducing Phylostratigraphic Errors and Biases
title_sort toward reducing phylostratigraphic errors and biases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105108/
https://www.ncbi.nlm.nih.gov/pubmed/30060201
http://dx.doi.org/10.1093/gbe/evy161
work_keys_str_mv AT moyersbryana towardreducingphylostratigraphicerrorsandbiases
AT zhangjianzhi towardreducingphylostratigraphicerrorsandbiases