Cargando…

Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowl...

Descripción completa

Detalles Bibliográficos
Autores principales: Joo, Hyun, Chavan, Archana G., Day, Ryan, Lennox, Kristin P., Sukhanov, Paul, Dahl, David B., Vannucci, Marina, Tsai, Jerry
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3197639/
https://www.ncbi.nlm.nih.gov/pubmed/22028638
http://dx.doi.org/10.1371/journal.pcbi.1002234
_version_ 1782214345425420288
author Joo, Hyun
Chavan, Archana G.
Day, Ryan
Lennox, Kristin P.
Sukhanov, Paul
Dahl, David B.
Vannucci, Marina
Tsai, Jerry
author_facet Joo, Hyun
Chavan, Archana G.
Day, Ryan
Lennox, Kristin P.
Sukhanov, Paul
Dahl, David B.
Vannucci, Marina
Tsai, Jerry
author_sort Joo, Hyun
collection PubMed
description Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/.
format Online
Article
Text
id pubmed-3197639
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31976392011-10-25 Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity Joo, Hyun Chavan, Archana G. Day, Ryan Lennox, Kristin P. Sukhanov, Paul Dahl, David B. Vannucci, Marina Tsai, Jerry PLoS Comput Biol Research Article Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. Public Library of Science 2011-10-20 /pmc/articles/PMC3197639/ /pubmed/22028638 http://dx.doi.org/10.1371/journal.pcbi.1002234 Text en Joo et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Joo, Hyun
Chavan, Archana G.
Day, Ryan
Lennox, Kristin P.
Sukhanov, Paul
Dahl, David B.
Vannucci, Marina
Tsai, Jerry
Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
title Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
title_full Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
title_fullStr Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
title_full_unstemmed Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
title_short Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity
title_sort near-native protein loop sampling using nonparametric density estimation accommodating sparcity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3197639/
https://www.ncbi.nlm.nih.gov/pubmed/22028638
http://dx.doi.org/10.1371/journal.pcbi.1002234
work_keys_str_mv AT joohyun nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT chavanarchanag nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT dayryan nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT lennoxkristinp nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT sukhanovpaul nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT dahldavidb nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT vannuccimarina nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity
AT tsaijerry nearnativeproteinloopsamplingusingnonparametricdensityestimationaccommodatingsparcity