Cargando…

Protein 3D Structure Computed from Evolutionary Sequence Variation

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these...

Descripción completa

Detalles Bibliográficos
Autores principales: Marks, Debora S., Colwell, Lucy J., Sheridan, Robert, Hopf, Thomas A., Pagnani, Andrea, Zecchina, Riccardo, Sander, Chris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3233603/
https://www.ncbi.nlm.nih.gov/pubmed/22163331
http://dx.doi.org/10.1371/journal.pone.0028766
_version_ 1782218442175152128
author Marks, Debora S.
Colwell, Lucy J.
Sheridan, Robert
Hopf, Thomas A.
Pagnani, Andrea
Zecchina, Riccardo
Sander, Chris
author_facet Marks, Debora S.
Colwell, Lucy J.
Sheridan, Robert
Hopf, Thomas A.
Pagnani, Andrea
Zecchina, Riccardo
Sander, Chris
author_sort Marks, Debora S.
collection PubMed
description The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
format Online
Article
Text
id pubmed-3233603
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32336032011-12-12 Protein 3D Structure Computed from Evolutionary Sequence Variation Marks, Debora S. Colwell, Lucy J. Sheridan, Robert Hopf, Thomas A. Pagnani, Andrea Zecchina, Riccardo Sander, Chris PLoS One Research Article The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. Public Library of Science 2011-12-07 /pmc/articles/PMC3233603/ /pubmed/22163331 http://dx.doi.org/10.1371/journal.pone.0028766 Text en Marks et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Marks, Debora S.
Colwell, Lucy J.
Sheridan, Robert
Hopf, Thomas A.
Pagnani, Andrea
Zecchina, Riccardo
Sander, Chris
Protein 3D Structure Computed from Evolutionary Sequence Variation
title Protein 3D Structure Computed from Evolutionary Sequence Variation
title_full Protein 3D Structure Computed from Evolutionary Sequence Variation
title_fullStr Protein 3D Structure Computed from Evolutionary Sequence Variation
title_full_unstemmed Protein 3D Structure Computed from Evolutionary Sequence Variation
title_short Protein 3D Structure Computed from Evolutionary Sequence Variation
title_sort protein 3d structure computed from evolutionary sequence variation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3233603/
https://www.ncbi.nlm.nih.gov/pubmed/22163331
http://dx.doi.org/10.1371/journal.pone.0028766
work_keys_str_mv AT marksdeboras protein3dstructurecomputedfromevolutionarysequencevariation
AT colwelllucyj protein3dstructurecomputedfromevolutionarysequencevariation
AT sheridanrobert protein3dstructurecomputedfromevolutionarysequencevariation
AT hopfthomasa protein3dstructurecomputedfromevolutionarysequencevariation
AT pagnaniandrea protein3dstructurecomputedfromevolutionarysequencevariation
AT zecchinariccardo protein3dstructurecomputedfromevolutionarysequencevariation
AT sanderchris protein3dstructurecomputedfromevolutionarysequencevariation