Cargando…
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8177639/ https://www.ncbi.nlm.nih.gov/pubmed/34029316 http://dx.doi.org/10.1371/journal.pcbi.1008957 |
_version_ | 1783703424529858560 |
---|---|
author | Rodriguez Horta, Edwin Weigt, Martin |
author_facet | Rodriguez Horta, Edwin Weigt, Martin |
author_sort | Rodriguez Horta, Edwin |
collection | PubMed |
description | Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. |
format | Online Article Text |
id | pubmed-8177639 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-81776392021-06-07 On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins Rodriguez Horta, Edwin Weigt, Martin PLoS Comput Biol Research Article Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. Public Library of Science 2021-05-24 /pmc/articles/PMC8177639/ /pubmed/34029316 http://dx.doi.org/10.1371/journal.pcbi.1008957 Text en © 2021 Rodriguez Horta, Weigt https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Rodriguez Horta, Edwin Weigt, Martin On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
title | On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
title_full | On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
title_fullStr | On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
title_full_unstemmed | On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
title_short | On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
title_sort | on the effect of phylogenetic correlations in coevolution-based contact prediction in proteins |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8177639/ https://www.ncbi.nlm.nih.gov/pubmed/34029316 http://dx.doi.org/10.1371/journal.pcbi.1008957 |
work_keys_str_mv | AT rodriguezhortaedwin ontheeffectofphylogeneticcorrelationsincoevolutionbasedcontactpredictioninproteins AT weigtmartin ontheeffectofphylogeneticcorrelationsincoevolutionbasedcontactpredictioninproteins |