Cargando…

Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data

Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these m...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodriguez Horta, Edwin, Barrat-Charlaix, Pierre, Weigt, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514434/
http://dx.doi.org/10.3390/e21111090
_version_ 1783586587280408576
author Rodriguez Horta, Edwin
Barrat-Charlaix, Pierre
Weigt, Martin
author_facet Rodriguez Horta, Edwin
Barrat-Charlaix, Pierre
Weigt, Martin
author_sort Rodriguez Horta, Edwin
collection PubMed
description Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too.
format Online
Article
Text
id pubmed-7514434
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75144342020-11-09 Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data Rodriguez Horta, Edwin Barrat-Charlaix, Pierre Weigt, Martin Entropy (Basel) Article Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too. MDPI 2019-11-07 /pmc/articles/PMC7514434/ http://dx.doi.org/10.3390/e21111090 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Rodriguez Horta, Edwin
Barrat-Charlaix, Pierre
Weigt, Martin
Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data
title Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data
title_full Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data
title_fullStr Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data
title_full_unstemmed Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data
title_short Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data
title_sort toward inferring potts models for phylogenetically correlated sequence data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514434/
http://dx.doi.org/10.3390/e21111090
work_keys_str_mv AT rodriguezhortaedwin towardinferringpottsmodelsforphylogeneticallycorrelatedsequencedata
AT barratcharlaixpierre towardinferringpottsmodelsforphylogeneticallycorrelatedsequencedata
AT weigtmartin towardinferringpottsmodelsforphylogeneticallycorrelatedsequencedata