Cargando…

Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis

BACKGROUND: Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mito...

Descripción completa

Detalles Bibliográficos
Autor principal: Miyazawa, Sanzo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4225520/
https://www.ncbi.nlm.nih.gov/pubmed/24256155
http://dx.doi.org/10.1186/1471-2148-13-257
_version_ 1782343525982011392
author Miyazawa, Sanzo
author_facet Miyazawa, Sanzo
author_sort Miyazawa, Sanzo
collection PubMed
description BACKGROUND: Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified? RESULTS: Nucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate. CONCLUSIONS: The mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences.
format Online
Article
Text
id pubmed-4225520
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42255202014-11-12 Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis Miyazawa, Sanzo BMC Evol Biol Research Article BACKGROUND: Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified? RESULTS: Nucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate. CONCLUSIONS: The mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences. BioMed Central 2013-11-21 /pmc/articles/PMC4225520/ /pubmed/24256155 http://dx.doi.org/10.1186/1471-2148-13-257 Text en Copyright © 2013 Miyazawa; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Miyazawa, Sanzo
Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
title Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
title_full Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
title_fullStr Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
title_full_unstemmed Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
title_short Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
title_sort superiority of a mechanistic codon substitution model even for protein sequences in phylogenetic analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4225520/
https://www.ncbi.nlm.nih.gov/pubmed/24256155
http://dx.doi.org/10.1186/1471-2148-13-257
work_keys_str_mv AT miyazawasanzo superiorityofamechanisticcodonsubstitutionmodelevenforproteinsequencesinphylogeneticanalysis