Cargando…

Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set

A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20- and six-term sequence entropy have been built and validated...

Descripción completa

Detalles Bibliográficos
Autores principales: Nepal, Reecha, Spencer, Joanna, Bhogal, Guneet, Nedunuri, Amulya, Poelman, Thomas, Kamath, Thejas, Chung, Edwin, Kantardjieff, Katherine, Gottlieb, Andrea, Lustig, Brooke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665666/
https://www.ncbi.nlm.nih.gov/pubmed/26664348
http://dx.doi.org/10.1107/S1600576715018531
_version_ 1782403609188630528
author Nepal, Reecha
Spencer, Joanna
Bhogal, Guneet
Nedunuri, Amulya
Poelman, Thomas
Kamath, Thejas
Chung, Edwin
Kantardjieff, Katherine
Gottlieb, Andrea
Lustig, Brooke
author_facet Nepal, Reecha
Spencer, Joanna
Bhogal, Guneet
Nedunuri, Amulya
Poelman, Thomas
Kamath, Thejas
Chung, Edwin
Kantardjieff, Katherine
Gottlieb, Andrea
Lustig, Brooke
author_sort Nepal, Reecha
collection PubMed
description A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20- and six-term sequence entropy have been built and validated. A domain-complete learning set of over 1300 proteins is used to fit initial models with various sequence homology descriptors as well as query residue qualitative descriptors. Homology descriptors are derived from BLASTp sequence alignments, whereas the RSA values are determined directly from the crystal structure. The logistic regression models are fitted using dichotomous responses indicating buried or accessible solvent, with binary classifications obtained from the RSA values. The fitted models determine binary predictions of residue solvent accessibility with accuracies comparable to other less computationally intensive methods using the standard RSA threshold criteria 20 and 25% as solvent accessible. When an additional non-homology descriptor describing Lobanov–Galzitskaya residue disorder propensity is included, incremental improvements in accuracy are achieved with 25% threshold accuracies of 76.12 and 74.79% for the Manesh-215 and CASP(8+9) test sets, respectively. Moreover, the described software and the accompanying learning and validation sets allow students and researchers to explore the utility of RSA prediction with simple, physically intuitive models in any number of related applications.
format Online
Article
Text
id pubmed-4665666
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-46656662015-12-10 Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set Nepal, Reecha Spencer, Joanna Bhogal, Guneet Nedunuri, Amulya Poelman, Thomas Kamath, Thejas Chung, Edwin Kantardjieff, Katherine Gottlieb, Andrea Lustig, Brooke J Appl Crystallogr Teaching and Education A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20- and six-term sequence entropy have been built and validated. A domain-complete learning set of over 1300 proteins is used to fit initial models with various sequence homology descriptors as well as query residue qualitative descriptors. Homology descriptors are derived from BLASTp sequence alignments, whereas the RSA values are determined directly from the crystal structure. The logistic regression models are fitted using dichotomous responses indicating buried or accessible solvent, with binary classifications obtained from the RSA values. The fitted models determine binary predictions of residue solvent accessibility with accuracies comparable to other less computationally intensive methods using the standard RSA threshold criteria 20 and 25% as solvent accessible. When an additional non-homology descriptor describing Lobanov–Galzitskaya residue disorder propensity is included, incremental improvements in accuracy are achieved with 25% threshold accuracies of 76.12 and 74.79% for the Manesh-215 and CASP(8+9) test sets, respectively. Moreover, the described software and the accompanying learning and validation sets allow students and researchers to explore the utility of RSA prediction with simple, physically intuitive models in any number of related applications. International Union of Crystallography 2015-11-10 /pmc/articles/PMC4665666/ /pubmed/26664348 http://dx.doi.org/10.1107/S1600576715018531 Text en © Reecha Nepal et al. 2015 http://creativecommons.org/licenses/by/2.0/uk/ This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
spellingShingle Teaching and Education
Nepal, Reecha
Spencer, Joanna
Bhogal, Guneet
Nedunuri, Amulya
Poelman, Thomas
Kamath, Thejas
Chung, Edwin
Kantardjieff, Katherine
Gottlieb, Andrea
Lustig, Brooke
Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
title Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
title_full Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
title_fullStr Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
title_full_unstemmed Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
title_short Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
title_sort logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete x-ray structure learning set
topic Teaching and Education
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665666/
https://www.ncbi.nlm.nih.gov/pubmed/26664348
http://dx.doi.org/10.1107/S1600576715018531
work_keys_str_mv AT nepalreecha logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT spencerjoanna logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT bhogalguneet logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT nedunuriamulya logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT poelmanthomas logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT kamaththejas logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT chungedwin logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT kantardjieffkatherine logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT gottliebandrea logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset
AT lustigbrooke logisticregressionmodelstopredictsolventaccessibleresiduesusingsequenceandhomologybasedqualitativeandquantitativedescriptorsappliedtoadomaincompletexraystructurelearningset