Cargando…
Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowl...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3197639/ https://www.ncbi.nlm.nih.gov/pubmed/22028638 http://dx.doi.org/10.1371/journal.pcbi.1002234 |
_version_ | 1782214345425420288 |
---|---|
author | Joo, Hyun Chavan, Archana G. Day, Ryan Lennox, Kristin P. Sukhanov, Paul Dahl, David B. Vannucci, Marina Tsai, Jerry |
author_facet | Joo, Hyun Chavan, Archana G. Day, Ryan Lennox, Kristin P. Sukhanov, Paul Dahl, David B. Vannucci, Marina Tsai, Jerry |
author_sort | Joo, Hyun |
collection | PubMed |
description | Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. |
format | Online Article Text |
id | pubmed-3197639 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-31976392011-10-25 Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity Joo, Hyun Chavan, Archana G. Day, Ryan Lennox, Kristin P. Sukhanov, Paul Dahl, David B. Vannucci, Marina Tsai, Jerry PLoS Comput Biol Research Article Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. Public Library of Science 2011-10-20 /pmc/articles/PMC3197639/ /pubmed/22028638 http://dx.doi.org/10.1371/journal.pcbi.1002234 Text en Joo et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Joo, Hyun Chavan, Archana G. Day, Ryan Lennox, Kristin P. Sukhanov, Paul Dahl, David B. Vannucci, Marina Tsai, Jerry Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity |
title | Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity |
title_full | Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity |
title_fullStr | Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity |
title_full_unstemmed | Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity |
title_short | Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity |
title_sort | near-native protein loop sampling using nonparametric density estimation accommodating sparcity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3197639/ https://www.ncbi.nlm.nih.gov/pubmed/22028638 http://dx.doi.org/10.1371/journal.pcbi.1002234 |
work_keys_str_mv | AT joohyun nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT chavanarchanag nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT dayryan nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT lennoxkristinp nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT sukhanovpaul nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT dahldavidb nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT vannuccimarina nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity AT tsaijerry nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity |