Cargando…

Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles

 : The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles ([Formula: see text]) of any amino acid by a mixture of a product of von Mi...

Descripción completa

Detalles Bibliográficos
Autores principales: Amarasinghe, Piyumi R, Allison, Lloyd, Stuckey, Peter J, Garcia de la Banda, Maria, Lesk, Arthur M, Konagurthu, Arun S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311319/
https://www.ncbi.nlm.nih.gov/pubmed/37387189
http://dx.doi.org/10.1093/bioinformatics/btad251
_version_ 1785066718256693248
author Amarasinghe, Piyumi R
Allison, Lloyd
Stuckey, Peter J
Garcia de la Banda, Maria
Lesk, Arthur M
Konagurthu, Arun S
author_facet Amarasinghe, Piyumi R
Allison, Lloyd
Stuckey, Peter J
Garcia de la Banda, Maria
Lesk, Arthur M
Konagurthu, Arun S
author_sort Amarasinghe, Piyumi R
collection PubMed
description  : The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles ([Formula: see text]) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles ([Formula: see text]) as a function of backbone [Formula: see text] conformations. A ‘good’ model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal ([Formula: see text] al). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.
format Online
Article
Text
id pubmed-10311319
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113192023-07-01 Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles Amarasinghe, Piyumi R Allison, Lloyd Stuckey, Peter J Garcia de la Banda, Maria Lesk, Arthur M Konagurthu, Arun S Bioinformatics Macromolecular Sequence, Structure, and Function  : The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles ([Formula: see text]) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles ([Formula: see text]) as a function of backbone [Formula: see text] conformations. A ‘good’ model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal ([Formula: see text] al). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical. Oxford University Press 2023-06-30 /pmc/articles/PMC10311319/ /pubmed/37387189 http://dx.doi.org/10.1093/bioinformatics/btad251 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Macromolecular Sequence, Structure, and Function
Amarasinghe, Piyumi R
Allison, Lloyd
Stuckey, Peter J
Garcia de la Banda, Maria
Lesk, Arthur M
Konagurthu, Arun S
Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
title Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
title_full Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
title_fullStr Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
title_full_unstemmed Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
title_short Getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
title_sort getting ‘ϕψχal’ with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles
topic Macromolecular Sequence, Structure, and Function
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311319/
https://www.ncbi.nlm.nih.gov/pubmed/37387189
http://dx.doi.org/10.1093/bioinformatics/btad251
work_keys_str_mv AT amarasinghepiyumir gettingphpschalwithproteinsminimummessagelengthinferenceofjointdistributionsofbackboneandsidechaindihedralangles
AT allisonlloyd gettingphpschalwithproteinsminimummessagelengthinferenceofjointdistributionsofbackboneandsidechaindihedralangles
AT stuckeypeterj gettingphpschalwithproteinsminimummessagelengthinferenceofjointdistributionsofbackboneandsidechaindihedralangles
AT garciadelabandamaria gettingphpschalwithproteinsminimummessagelengthinferenceofjointdistributionsofbackboneandsidechaindihedralangles
AT leskarthurm gettingphpschalwithproteinsminimummessagelengthinferenceofjointdistributionsofbackboneandsidechaindihedralangles
AT konagurthuaruns gettingphpschalwithproteinsminimummessagelengthinferenceofjointdistributionsofbackboneandsidechaindihedralangles