Cargando…

Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners

In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accu...

Descripción completa

Detalles Bibliográficos
Autores principales: Baldassi, Carlo, Zamparo, Marco, Feinauer, Christoph, Procaccini, Andrea, Zecchina, Riccardo, Weigt, Martin, Pagnani, Andrea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3963956/
https://www.ncbi.nlm.nih.gov/pubmed/24663061
http://dx.doi.org/10.1371/journal.pone.0092721
_version_ 1782308571328806912
author Baldassi, Carlo
Zamparo, Marco
Feinauer, Christoph
Procaccini, Andrea
Zecchina, Riccardo
Weigt, Martin
Pagnani, Andrea
author_facet Baldassi, Carlo
Zamparo, Marco
Feinauer, Christoph
Procaccini, Andrea
Zecchina, Riccardo
Weigt, Martin
Pagnani, Andrea
author_sort Baldassi, Carlo
collection PubMed
description In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.
format Online
Article
Text
id pubmed-3963956
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39639562014-03-27 Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners Baldassi, Carlo Zamparo, Marco Feinauer, Christoph Procaccini, Andrea Zecchina, Riccardo Weigt, Martin Pagnani, Andrea PLoS One Research Article In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. Public Library of Science 2014-03-24 /pmc/articles/PMC3963956/ /pubmed/24663061 http://dx.doi.org/10.1371/journal.pone.0092721 Text en © 2014 Baldassi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Baldassi, Carlo
Zamparo, Marco
Feinauer, Christoph
Procaccini, Andrea
Zecchina, Riccardo
Weigt, Martin
Pagnani, Andrea
Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
title Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
title_full Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
title_fullStr Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
title_full_unstemmed Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
title_short Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
title_sort fast and accurate multivariate gaussian modeling of protein families: predicting residue contacts and protein-interaction partners
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3963956/
https://www.ncbi.nlm.nih.gov/pubmed/24663061
http://dx.doi.org/10.1371/journal.pone.0092721
work_keys_str_mv AT baldassicarlo fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT zamparomarco fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT feinauerchristoph fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT procacciniandrea fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT zecchinariccardo fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT weigtmartin fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT pagnaniandrea fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners