Cargando…
Impact of phylogeny on structural contact inference from protein sequence data
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogene...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9905998/ https://www.ncbi.nlm.nih.gov/pubmed/36751926 http://dx.doi.org/10.1098/rsif.2022.0707 |
_version_ | 1784883920299360256 |
---|---|
author | Dietler, Nicola Lupo, Umberto Bitbol, Anne-Florence |
author_facet | Dietler, Nicola Lupo, Umberto Bitbol, Anne-Florence |
author_sort | Dietler, Nicola |
collection | PubMed |
description | Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference. |
format | Online Article Text |
id | pubmed-9905998 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | The Royal Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-99059982023-02-09 Impact of phylogeny on structural contact inference from protein sequence data Dietler, Nicola Lupo, Umberto Bitbol, Anne-Florence J R Soc Interface Life Sciences–Physics interface Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference. The Royal Society 2023-02-08 /pmc/articles/PMC9905998/ /pubmed/36751926 http://dx.doi.org/10.1098/rsif.2022.0707 Text en © 2023 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Life Sciences–Physics interface Dietler, Nicola Lupo, Umberto Bitbol, Anne-Florence Impact of phylogeny on structural contact inference from protein sequence data |
title | Impact of phylogeny on structural contact inference from protein sequence data |
title_full | Impact of phylogeny on structural contact inference from protein sequence data |
title_fullStr | Impact of phylogeny on structural contact inference from protein sequence data |
title_full_unstemmed | Impact of phylogeny on structural contact inference from protein sequence data |
title_short | Impact of phylogeny on structural contact inference from protein sequence data |
title_sort | impact of phylogeny on structural contact inference from protein sequence data |
topic | Life Sciences–Physics interface |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9905998/ https://www.ncbi.nlm.nih.gov/pubmed/36751926 http://dx.doi.org/10.1098/rsif.2022.0707 |
work_keys_str_mv | AT dietlernicola impactofphylogenyonstructuralcontactinferencefromproteinsequencedata AT lupoumberto impactofphylogenyonstructuralcontactinferencefromproteinsequencedata AT bitbolanneflorence impactofphylogenyonstructuralcontactinferencefromproteinsequencedata |