Cargando…

Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest

BACKGROUND: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure....

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Juyong, Lee, Kiho, Joung, InSuk, Joo, Keehyoung, Brooks, Bernard R, Lee, Jooyoung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374281/
https://www.ncbi.nlm.nih.gov/pubmed/25886990
http://dx.doi.org/10.1186/s12859-015-0526-z
_version_ 1782363463354417152
author Lee, Juyong
Lee, Kiho
Joung, InSuk
Joo, Keehyoung
Brooks, Bernard R
Lee, Jooyoung
author_facet Lee, Juyong
Lee, Kiho
Joung, InSuk
Joo, Keehyoung
Brooks, Bernard R
Lee, Jooyoung
author_sort Lee, Juyong
collection PubMed
description BACKGROUND: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling. RESULTS: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones. CONCLUSIONS: We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0526-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4374281
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43742812015-03-27 Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest Lee, Juyong Lee, Kiho Joung, InSuk Joo, Keehyoung Brooks, Bernard R Lee, Jooyoung BMC Bioinformatics Methodology Article BACKGROUND: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling. RESULTS: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones. CONCLUSIONS: We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0526-z) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-21 /pmc/articles/PMC4374281/ /pubmed/25886990 http://dx.doi.org/10.1186/s12859-015-0526-z Text en © Lee et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Lee, Juyong
Lee, Kiho
Joung, InSuk
Joo, Keehyoung
Brooks, Bernard R
Lee, Jooyoung
Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest
title Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest
title_full Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest
title_fullStr Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest
title_full_unstemmed Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest
title_short Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest
title_sort sigma-rf: prediction of the variability of spatial restraints in template-based modeling by random forest
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374281/
https://www.ncbi.nlm.nih.gov/pubmed/25886990
http://dx.doi.org/10.1186/s12859-015-0526-z
work_keys_str_mv AT leejuyong sigmarfpredictionofthevariabilityofspatialrestraintsintemplatebasedmodelingbyrandomforest
AT leekiho sigmarfpredictionofthevariabilityofspatialrestraintsintemplatebasedmodelingbyrandomforest
AT jounginsuk sigmarfpredictionofthevariabilityofspatialrestraintsintemplatebasedmodelingbyrandomforest
AT jookeehyoung sigmarfpredictionofthevariabilityofspatialrestraintsintemplatebasedmodelingbyrandomforest
AT brooksbernardr sigmarfpredictionofthevariabilityofspatialrestraintsintemplatebasedmodelingbyrandomforest
AT leejooyoung sigmarfpredictionofthevariabilityofspatialrestraintsintemplatebasedmodelingbyrandomforest