Cargando…

Phylogenetic correlations can suffice to infer protein partners from sequences

Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success...

Descripción completa

Detalles Bibliográficos
Autores principales: Marmier, Guillaume, Weigt, Martin, Bitbol, Anne-Florence
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812855/
https://www.ncbi.nlm.nih.gov/pubmed/31609984
http://dx.doi.org/10.1371/journal.pcbi.1007179
_version_ 1783462726597607424
author Marmier, Guillaume
Weigt, Martin
Bitbol, Anne-Florence
author_facet Marmier, Guillaume
Weigt, Martin
Bitbol, Anne-Florence
author_sort Marmier, Guillaume
collection PubMed
description Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history.
format Online
Article
Text
id pubmed-6812855
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68128552019-11-02 Phylogenetic correlations can suffice to infer protein partners from sequences Marmier, Guillaume Weigt, Martin Bitbol, Anne-Florence PLoS Comput Biol Research Article Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history. Public Library of Science 2019-10-14 /pmc/articles/PMC6812855/ /pubmed/31609984 http://dx.doi.org/10.1371/journal.pcbi.1007179 Text en © 2019 Marmier et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Marmier, Guillaume
Weigt, Martin
Bitbol, Anne-Florence
Phylogenetic correlations can suffice to infer protein partners from sequences
title Phylogenetic correlations can suffice to infer protein partners from sequences
title_full Phylogenetic correlations can suffice to infer protein partners from sequences
title_fullStr Phylogenetic correlations can suffice to infer protein partners from sequences
title_full_unstemmed Phylogenetic correlations can suffice to infer protein partners from sequences
title_short Phylogenetic correlations can suffice to infer protein partners from sequences
title_sort phylogenetic correlations can suffice to infer protein partners from sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812855/
https://www.ncbi.nlm.nih.gov/pubmed/31609984
http://dx.doi.org/10.1371/journal.pcbi.1007179
work_keys_str_mv AT marmierguillaume phylogeneticcorrelationscansufficetoinferproteinpartnersfromsequences
AT weigtmartin phylogeneticcorrelationscansufficetoinferproteinpartnersfromsequences
AT bitbolanneflorence phylogeneticcorrelationscansufficetoinferproteinpartnersfromsequences