Cargando…
Non-Markovian effects on protein sequence evolution due to site dependent substitution rates
BACKGROUND: Many models of protein sequence evolution, in particular those based on Point Accepted Mutation (PAM) matrices, assume that its dynamics is Markovian. Nevertheless, it has been observed that evolution seems to proceed differently at different time scales, questioning this assumption. In...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4921000/ https://www.ncbi.nlm.nih.gov/pubmed/27342318 http://dx.doi.org/10.1186/s12859-016-1135-1 |
_version_ | 1782439460911185920 |
---|---|
author | Rizzato, Francesca Rodriguez, Alex Laio, Alessandro |
author_facet | Rizzato, Francesca Rodriguez, Alex Laio, Alessandro |
author_sort | Rizzato, Francesca |
collection | PubMed |
description | BACKGROUND: Many models of protein sequence evolution, in particular those based on Point Accepted Mutation (PAM) matrices, assume that its dynamics is Markovian. Nevertheless, it has been observed that evolution seems to proceed differently at different time scales, questioning this assumption. In 2011 Kosiol and Goldman proved that, if evolution is Markovian at the codon level, it can not be Markovian at the amino acid level. However, it remains unclear up to which point the Markov assumption is verified at the codon level. RESULTS: Here we show how also the among-site variability of substitution rates makes the process of full protein sequence evolution effectively not Markovian even at the codon level. This may be the theoretical explanation behind the well known systematic underestimation of evolutionary distances observed when omitting rate variability. If the substitution rate variability is neglected the average amino acid and codon replacement probabilities are affected by systematic errors and those with the largest mismatches are the substitutions involving more than one nucleotide at a time. On the other hand, the instantaneous substitution matrices estimated from alignments with the Markov assumption tend to overestimate double and triple substitutions, even when learned from alignments at high sequence identity. CONCLUSIONS: These results discourage the use of simple Markov models to describe full protein sequence evolution and encourage to employ, whenever possible, models that account for rate variability by construction (such as hidden Markov models or mixture models) or substitution models of the type of Le and Gascuel (2008) that account for it explicitly. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1135-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4921000 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49210002016-06-28 Non-Markovian effects on protein sequence evolution due to site dependent substitution rates Rizzato, Francesca Rodriguez, Alex Laio, Alessandro BMC Bioinformatics Methodology Article BACKGROUND: Many models of protein sequence evolution, in particular those based on Point Accepted Mutation (PAM) matrices, assume that its dynamics is Markovian. Nevertheless, it has been observed that evolution seems to proceed differently at different time scales, questioning this assumption. In 2011 Kosiol and Goldman proved that, if evolution is Markovian at the codon level, it can not be Markovian at the amino acid level. However, it remains unclear up to which point the Markov assumption is verified at the codon level. RESULTS: Here we show how also the among-site variability of substitution rates makes the process of full protein sequence evolution effectively not Markovian even at the codon level. This may be the theoretical explanation behind the well known systematic underestimation of evolutionary distances observed when omitting rate variability. If the substitution rate variability is neglected the average amino acid and codon replacement probabilities are affected by systematic errors and those with the largest mismatches are the substitutions involving more than one nucleotide at a time. On the other hand, the instantaneous substitution matrices estimated from alignments with the Markov assumption tend to overestimate double and triple substitutions, even when learned from alignments at high sequence identity. CONCLUSIONS: These results discourage the use of simple Markov models to describe full protein sequence evolution and encourage to employ, whenever possible, models that account for rate variability by construction (such as hidden Markov models or mixture models) or substitution models of the type of Le and Gascuel (2008) that account for it explicitly. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1135-1) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-24 /pmc/articles/PMC4921000/ /pubmed/27342318 http://dx.doi.org/10.1186/s12859-016-1135-1 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Rizzato, Francesca Rodriguez, Alex Laio, Alessandro Non-Markovian effects on protein sequence evolution due to site dependent substitution rates |
title | Non-Markovian effects on protein sequence evolution due to site dependent substitution rates |
title_full | Non-Markovian effects on protein sequence evolution due to site dependent substitution rates |
title_fullStr | Non-Markovian effects on protein sequence evolution due to site dependent substitution rates |
title_full_unstemmed | Non-Markovian effects on protein sequence evolution due to site dependent substitution rates |
title_short | Non-Markovian effects on protein sequence evolution due to site dependent substitution rates |
title_sort | non-markovian effects on protein sequence evolution due to site dependent substitution rates |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4921000/ https://www.ncbi.nlm.nih.gov/pubmed/27342318 http://dx.doi.org/10.1186/s12859-016-1135-1 |
work_keys_str_mv | AT rizzatofrancesca nonmarkovianeffectsonproteinsequenceevolutionduetositedependentsubstitutionrates AT rodriguezalex nonmarkovianeffectsonproteinsequenceevolutionduetositedependentsubstitutionrates AT laioalessandro nonmarkovianeffectsonproteinsequenceevolutionduetositedependentsubstitutionrates |