Cargando…

Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space

How can we best learn the history of a protein’s evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be...

Descripción completa

Detalles Bibliográficos
Autores principales: Weber, Claudia C, Perron, Umberto, Casey, Dearbhaile, Yang, Ziheng, Goldman, Nick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7744038/
https://www.ncbi.nlm.nih.gov/pubmed/32353118
http://dx.doi.org/10.1093/sysbio/syaa036
_version_ 1783624354444083200
author Weber, Claudia C
Perron, Umberto
Casey, Dearbhaile
Yang, Ziheng
Goldman, Nick
author_facet Weber, Claudia C
Perron, Umberto
Casey, Dearbhaile
Yang, Ziheng
Goldman, Nick
author_sort Weber, Claudia C
collection PubMed
description How can we best learn the history of a protein’s evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the “unseen” state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that [Formula: see text] , a parameter describing the relative strength of selection on nonsynonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible. [Ancestral reconstruction; natural selection; protein structure; state-spaces; substitution models.]
format Online
Article
Text
id pubmed-7744038
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77440382020-12-22 Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space Weber, Claudia C Perron, Umberto Casey, Dearbhaile Yang, Ziheng Goldman, Nick Syst Biol Regular Articles How can we best learn the history of a protein’s evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the “unseen” state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that [Formula: see text] , a parameter describing the relative strength of selection on nonsynonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible. [Ancestral reconstruction; natural selection; protein structure; state-spaces; substitution models.] Oxford University Press 2020-04-30 /pmc/articles/PMC7744038/ /pubmed/32353118 http://dx.doi.org/10.1093/sysbio/syaa036 Text en © The Author(s) 2020. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Weber, Claudia C
Perron, Umberto
Casey, Dearbhaile
Yang, Ziheng
Goldman, Nick
Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space
title Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space
title_full Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space
title_fullStr Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space
title_full_unstemmed Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space
title_short Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space
title_sort ambiguity coding allows accurate inference of evolutionary parameters from alignments in an aggregated state-space
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7744038/
https://www.ncbi.nlm.nih.gov/pubmed/32353118
http://dx.doi.org/10.1093/sysbio/syaa036
work_keys_str_mv AT weberclaudiac ambiguitycodingallowsaccurateinferenceofevolutionaryparametersfromalignmentsinanaggregatedstatespace
AT perronumberto ambiguitycodingallowsaccurateinferenceofevolutionaryparametersfromalignmentsinanaggregatedstatespace
AT caseydearbhaile ambiguitycodingallowsaccurateinferenceofevolutionaryparametersfromalignmentsinanaggregatedstatespace
AT yangziheng ambiguitycodingallowsaccurateinferenceofevolutionaryparametersfromalignmentsinanaggregatedstatespace
AT goldmannick ambiguitycodingallowsaccurateinferenceofevolutionaryparametersfromalignmentsinanaggregatedstatespace