Cargando…

Impact of phylogeny on structural contact inference from protein sequence data

Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogene...

Descripción completa

Detalles Bibliográficos
Autores principales: Dietler, Nicola, Lupo, Umberto, Bitbol, Anne-Florence
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9905998/
https://www.ncbi.nlm.nih.gov/pubmed/36751926
http://dx.doi.org/10.1098/rsif.2022.0707
_version_ 1784883920299360256
author Dietler, Nicola
Lupo, Umberto
Bitbol, Anne-Florence
author_facet Dietler, Nicola
Lupo, Umberto
Bitbol, Anne-Florence
author_sort Dietler, Nicola
collection PubMed
description Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.
format Online
Article
Text
id pubmed-9905998
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-99059982023-02-09 Impact of phylogeny on structural contact inference from protein sequence data Dietler, Nicola Lupo, Umberto Bitbol, Anne-Florence J R Soc Interface Life Sciences–Physics interface Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference. The Royal Society 2023-02-08 /pmc/articles/PMC9905998/ /pubmed/36751926 http://dx.doi.org/10.1098/rsif.2022.0707 Text en © 2023 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited.
spellingShingle Life Sciences–Physics interface
Dietler, Nicola
Lupo, Umberto
Bitbol, Anne-Florence
Impact of phylogeny on structural contact inference from protein sequence data
title Impact of phylogeny on structural contact inference from protein sequence data
title_full Impact of phylogeny on structural contact inference from protein sequence data
title_fullStr Impact of phylogeny on structural contact inference from protein sequence data
title_full_unstemmed Impact of phylogeny on structural contact inference from protein sequence data
title_short Impact of phylogeny on structural contact inference from protein sequence data
title_sort impact of phylogeny on structural contact inference from protein sequence data
topic Life Sciences–Physics interface
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9905998/
https://www.ncbi.nlm.nih.gov/pubmed/36751926
http://dx.doi.org/10.1098/rsif.2022.0707
work_keys_str_mv AT dietlernicola impactofphylogenyonstructuralcontactinferencefromproteinsequencedata
AT lupoumberto impactofphylogenyonstructuralcontactinferencefromproteinsequencedata
AT bitbolanneflorence impactofphylogenyonstructuralcontactinferencefromproteinsequencedata