Cargando…

A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank

BACKGROUND: Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the...

Descripción completa

Detalles Bibliográficos
Autores principales: Bastolla, Ugo, Porto, Markus, Roman, H Eduardo, Vendruscolo, Michele
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1570368/
https://www.ncbi.nlm.nih.gov/pubmed/16737532
http://dx.doi.org/10.1186/1471-2148-6-43
_version_ 1782130260257538048
author Bastolla, Ugo
Porto, Markus
Roman, H Eduardo
Vendruscolo, Michele
author_facet Bastolla, Ugo
Porto, Markus
Roman, H Eduardo
Vendruscolo, Michele
author_sort Bastolla, Ugo
collection PubMed
description BACKGROUND: Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. RESULTS: We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding <r> = 0.90 with five parameters. CONCLUSION: The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.
format Text
id pubmed-1570368
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15703682006-09-26 A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank Bastolla, Ugo Porto, Markus Roman, H Eduardo Vendruscolo, Michele BMC Evol Biol Research Article BACKGROUND: Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. RESULTS: We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding <r> = 0.90 with five parameters. CONCLUSION: The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations. BioMed Central 2006-05-31 /pmc/articles/PMC1570368/ /pubmed/16737532 http://dx.doi.org/10.1186/1471-2148-6-43 Text en Copyright © 2006 Bastolla et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bastolla, Ugo
Porto, Markus
Roman, H Eduardo
Vendruscolo, Michele
A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank
title A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank
title_full A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank
title_fullStr A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank
title_full_unstemmed A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank
title_short A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank
title_sort protein evolution model with independent sites that reproduces site-specific amino acid distributions from the protein data bank
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1570368/
https://www.ncbi.nlm.nih.gov/pubmed/16737532
http://dx.doi.org/10.1186/1471-2148-6-43
work_keys_str_mv AT bastollaugo aproteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT portomarkus aproteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT romanheduardo aproteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT vendruscolomichele aproteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT bastollaugo proteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT portomarkus proteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT romanheduardo proteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank
AT vendruscolomichele proteinevolutionmodelwithindependentsitesthatreproducessitespecificaminoaciddistributionsfromtheproteindatabank