Cargando…

A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences

Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by th...

Descripción completa

Detalles Bibliográficos
Autores principales: Groussin, M., Boussau, B., Gouy, M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676677/
https://www.ncbi.nlm.nih.gov/pubmed/23475623
http://dx.doi.org/10.1093/sysbio/syt016
_version_ 1782272653190496256
author Groussin, M.
Boussau, B.
Gouy, M.
author_facet Groussin, M.
Boussau, B.
Gouy, M.
author_sort Groussin, M.
collection PubMed
description Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor. [Ancestral sequence reconstruction; nonhomogeneous model; optimal growth temperature; phylogenomics; phylogeny.]
format Online
Article
Text
id pubmed-3676677
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36766772013-06-10 A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences Groussin, M. Boussau, B. Gouy, M. Syst Biol Regular Articles Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor. [Ancestral sequence reconstruction; nonhomogeneous model; optimal growth temperature; phylogenomics; phylogeny.] Oxford University Press 2013-07 2013-04-10 /pmc/articles/PMC3676677/ /pubmed/23475623 http://dx.doi.org/10.1093/sysbio/syt016 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Groussin, M.
Boussau, B.
Gouy, M.
A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
title A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
title_full A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
title_fullStr A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
title_full_unstemmed A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
title_short A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
title_sort branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676677/
https://www.ncbi.nlm.nih.gov/pubmed/23475623
http://dx.doi.org/10.1093/sysbio/syt016
work_keys_str_mv AT groussinm abranchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences
AT boussaub abranchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences
AT gouym abranchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences
AT groussinm branchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences
AT boussaub branchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences
AT gouym branchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences