Cargando…

On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence

BACKGROUND: The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal,...

Descripción completa

Detalles Bibliográficos
Autor principal: Theobald, Douglas L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314578/
https://www.ncbi.nlm.nih.gov/pubmed/22114984
http://dx.doi.org/10.1186/1745-6150-6-60
_version_ 1782228104666677248
author Theobald, Douglas L
author_facet Theobald, Douglas L
author_sort Theobald, Douglas L
collection PubMed
description BACKGROUND: The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. RESULTS: For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data. CONCLUSIONS: For K&W's artificial protein data, sequence similarity is the predominant factor influencing the preference for common ancestry. In contrast, for the real proteins, model selection tests show that phylogenetic structure is much more important than sequence similarity. Hence, the model selection tests demonstrate that real universally conserved proteins are homologous, a conclusion based primarily on the specific nested patterns of correlations induced in genetically related protein sequences. REVIEWERS: This article was reviewed by Rob Knight, Robert Beiko (nominated by Peter Gogarten), and Michael Gilchrist.
format Online
Article
Text
id pubmed-3314578
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33145782012-04-02 On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence Theobald, Douglas L Biol Direct Research BACKGROUND: The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. RESULTS: For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data. CONCLUSIONS: For K&W's artificial protein data, sequence similarity is the predominant factor influencing the preference for common ancestry. In contrast, for the real proteins, model selection tests show that phylogenetic structure is much more important than sequence similarity. Hence, the model selection tests demonstrate that real universally conserved proteins are homologous, a conclusion based primarily on the specific nested patterns of correlations induced in genetically related protein sequences. REVIEWERS: This article was reviewed by Rob Knight, Robert Beiko (nominated by Peter Gogarten), and Michael Gilchrist. BioMed Central 2011-11-24 /pmc/articles/PMC3314578/ /pubmed/22114984 http://dx.doi.org/10.1186/1745-6150-6-60 Text en Copyright ©2011 Theobald; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Theobald, Douglas L
On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
title On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
title_full On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
title_fullStr On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
title_full_unstemmed On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
title_short On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
title_sort on universal common ancestry, sequence similarity, and phylogenetic structure: the sins of p-values and the virtues of bayesian evidence
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314578/
https://www.ncbi.nlm.nih.gov/pubmed/22114984
http://dx.doi.org/10.1186/1745-6150-6-60
work_keys_str_mv AT theobalddouglasl onuniversalcommonancestrysequencesimilarityandphylogeneticstructurethesinsofpvaluesandthevirtuesofbayesianevidence