Cargando…
A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by th...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676677/ https://www.ncbi.nlm.nih.gov/pubmed/23475623 http://dx.doi.org/10.1093/sysbio/syt016 |
_version_ | 1782272653190496256 |
---|---|
author | Groussin, M. Boussau, B. Gouy, M. |
author_facet | Groussin, M. Boussau, B. Gouy, M. |
author_sort | Groussin, M. |
collection | PubMed |
description | Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor. [Ancestral sequence reconstruction; nonhomogeneous model; optimal growth temperature; phylogenomics; phylogeny.] |
format | Online Article Text |
id | pubmed-3676677 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-36766772013-06-10 A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences Groussin, M. Boussau, B. Gouy, M. Syst Biol Regular Articles Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor. [Ancestral sequence reconstruction; nonhomogeneous model; optimal growth temperature; phylogenomics; phylogeny.] Oxford University Press 2013-07 2013-04-10 /pmc/articles/PMC3676677/ /pubmed/23475623 http://dx.doi.org/10.1093/sysbio/syt016 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Regular Articles Groussin, M. Boussau, B. Gouy, M. A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences |
title | A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences |
title_full | A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences |
title_fullStr | A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences |
title_full_unstemmed | A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences |
title_short | A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences |
title_sort | branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences |
topic | Regular Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676677/ https://www.ncbi.nlm.nih.gov/pubmed/23475623 http://dx.doi.org/10.1093/sysbio/syt016 |
work_keys_str_mv | AT groussinm abranchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences AT boussaub abranchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences AT gouym abranchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences AT groussinm branchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences AT boussaub branchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences AT gouym branchheterogeneousmodelofproteinevolutionforefficientinferenceofancestralsequences |