Cargando…

The intrinsic dimension of protein sequence evolution

It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by...

Descripción completa

Detalles Bibliográficos
Autores principales: Facco, Elena, Pagnani, Andrea, Russo, Elena Tea, Laio, Alessandro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472826/
https://www.ncbi.nlm.nih.gov/pubmed/30958823
http://dx.doi.org/10.1371/journal.pcbi.1006767
_version_ 1783412319865274368
author Facco, Elena
Pagnani, Andrea
Russo, Elena Tea
Laio, Alessandro
author_facet Facco, Elena
Pagnani, Andrea
Russo, Elena Tea
Laio, Alessandro
author_sort Facco, Elena
collection PubMed
description It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.
format Online
Article
Text
id pubmed-6472826
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64728262019-05-03 The intrinsic dimension of protein sequence evolution Facco, Elena Pagnani, Andrea Russo, Elena Tea Laio, Alessandro PLoS Comput Biol Research Article It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences. Public Library of Science 2019-04-08 /pmc/articles/PMC6472826/ /pubmed/30958823 http://dx.doi.org/10.1371/journal.pcbi.1006767 Text en © 2019 Facco et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Facco, Elena
Pagnani, Andrea
Russo, Elena Tea
Laio, Alessandro
The intrinsic dimension of protein sequence evolution
title The intrinsic dimension of protein sequence evolution
title_full The intrinsic dimension of protein sequence evolution
title_fullStr The intrinsic dimension of protein sequence evolution
title_full_unstemmed The intrinsic dimension of protein sequence evolution
title_short The intrinsic dimension of protein sequence evolution
title_sort intrinsic dimension of protein sequence evolution
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472826/
https://www.ncbi.nlm.nih.gov/pubmed/30958823
http://dx.doi.org/10.1371/journal.pcbi.1006767
work_keys_str_mv AT faccoelena theintrinsicdimensionofproteinsequenceevolution
AT pagnaniandrea theintrinsicdimensionofproteinsequenceevolution
AT russoelenatea theintrinsicdimensionofproteinsequenceevolution
AT laioalessandro theintrinsicdimensionofproteinsequenceevolution
AT faccoelena intrinsicdimensionofproteinsequenceevolution
AT pagnaniandrea intrinsicdimensionofproteinsequenceevolution
AT russoelenatea intrinsicdimensionofproteinsequenceevolution
AT laioalessandro intrinsicdimensionofproteinsequenceevolution