Cargando…

From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analy...

Descripción completa

Detalles Bibliográficos
Autores principales: Cocco, Simona, Monasson, Remi, Weigt, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3749948/
https://www.ncbi.nlm.nih.gov/pubmed/23990764
http://dx.doi.org/10.1371/journal.pcbi.1003176
_version_ 1782477048283922432
author Cocco, Simona
Monasson, Remi
Weigt, Martin
author_facet Cocco, Simona
Monasson, Remi
Weigt, Martin
author_sort Cocco, Simona
collection PubMed
description Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.
format Online
Article
Text
id pubmed-3749948
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37499482013-08-29 From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction Cocco, Simona Monasson, Remi Weigt, Martin PLoS Comput Biol Research Article Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. Public Library of Science 2013-08-22 /pmc/articles/PMC3749948/ /pubmed/23990764 http://dx.doi.org/10.1371/journal.pcbi.1003176 Text en © 2013 Cocco et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Cocco, Simona
Monasson, Remi
Weigt, Martin
From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction
title From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction
title_full From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction
title_fullStr From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction
title_full_unstemmed From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction
title_short From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction
title_sort from principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3749948/
https://www.ncbi.nlm.nih.gov/pubmed/23990764
http://dx.doi.org/10.1371/journal.pcbi.1003176
work_keys_str_mv AT coccosimona fromprincipalcomponenttodirectcouplinganalysisofcoevolutioninproteinsloweigenvaluemodesareneededforstructureprediction
AT monassonremi fromprincipalcomponenttodirectcouplinganalysisofcoevolutioninproteinsloweigenvaluemodesareneededforstructureprediction
AT weigtmartin fromprincipalcomponenttodirectcouplinganalysisofcoevolutioninproteinsloweigenvaluemodesareneededforstructureprediction