Cargando…

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodriguez Horta, Edwin, Weigt, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8177639/
https://www.ncbi.nlm.nih.gov/pubmed/34029316
http://dx.doi.org/10.1371/journal.pcbi.1008957
_version_ 1783703424529858560
author Rodriguez Horta, Edwin
Weigt, Martin
author_facet Rodriguez Horta, Edwin
Weigt, Martin
author_sort Rodriguez Horta, Edwin
collection PubMed
description Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.
format Online
Article
Text
id pubmed-8177639
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-81776392021-06-07 On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins Rodriguez Horta, Edwin Weigt, Martin PLoS Comput Biol Research Article Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. Public Library of Science 2021-05-24 /pmc/articles/PMC8177639/ /pubmed/34029316 http://dx.doi.org/10.1371/journal.pcbi.1008957 Text en © 2021 Rodriguez Horta, Weigt https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rodriguez Horta, Edwin
Weigt, Martin
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
title On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
title_full On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
title_fullStr On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
title_full_unstemmed On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
title_short On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
title_sort on the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8177639/
https://www.ncbi.nlm.nih.gov/pubmed/34029316
http://dx.doi.org/10.1371/journal.pcbi.1008957
work_keys_str_mv AT rodriguezhortaedwin ontheeffectofphylogeneticcorrelationsincoevolutionbasedcontactpredictioninproteins
AT weigtmartin ontheeffectofphylogeneticcorrelationsincoevolutionbasedcontactpredictioninproteins