Cargando…

Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distribut...

Descripción completa

Detalles Bibliográficos
Autores principales: Ting, Daniel, Wang, Guoli, Shapovalov, Maxim, Mitra, Rajib, Jordan, Michael I., Dunbrack, Roland L.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2861699/
https://www.ncbi.nlm.nih.gov/pubmed/20442867
http://dx.doi.org/10.1371/journal.pcbi.1000763
_version_ 1782180665294323712
author Ting, Daniel
Wang, Guoli
Shapovalov, Maxim
Mitra, Rajib
Jordan, Michael I.
Dunbrack, Roland L.
author_facet Ting, Daniel
Wang, Guoli
Shapovalov, Maxim
Mitra, Rajib
Jordan, Michael I.
Dunbrack, Roland L.
author_sort Ting, Daniel
collection PubMed
description Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.
format Text
id pubmed-2861699
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28616992010-05-04 Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model Ting, Daniel Wang, Guoli Shapovalov, Maxim Mitra, Rajib Jordan, Michael I. Dunbrack, Roland L. PLoS Comput Biol Research Article Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp. Public Library of Science 2010-04-29 /pmc/articles/PMC2861699/ /pubmed/20442867 http://dx.doi.org/10.1371/journal.pcbi.1000763 Text en Ting et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ting, Daniel
Wang, Guoli
Shapovalov, Maxim
Mitra, Rajib
Jordan, Michael I.
Dunbrack, Roland L.
Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
title Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
title_full Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
title_fullStr Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
title_full_unstemmed Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
title_short Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
title_sort neighbor-dependent ramachandran probability distributions of amino acids developed from a hierarchical dirichlet process model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2861699/
https://www.ncbi.nlm.nih.gov/pubmed/20442867
http://dx.doi.org/10.1371/journal.pcbi.1000763
work_keys_str_mv AT tingdaniel neighbordependentramachandranprobabilitydistributionsofaminoacidsdevelopedfromahierarchicaldirichletprocessmodel
AT wangguoli neighbordependentramachandranprobabilitydistributionsofaminoacidsdevelopedfromahierarchicaldirichletprocessmodel
AT shapovalovmaxim neighbordependentramachandranprobabilitydistributionsofaminoacidsdevelopedfromahierarchicaldirichletprocessmodel
AT mitrarajib neighbordependentramachandranprobabilitydistributionsofaminoacidsdevelopedfromahierarchicaldirichletprocessmodel
AT jordanmichaeli neighbordependentramachandranprobabilitydistributionsofaminoacidsdevelopedfromahierarchicaldirichletprocessmodel
AT dunbrackrolandl neighbordependentramachandranprobabilitydistributionsofaminoacidsdevelopedfromahierarchicaldirichletprocessmodel