Cargando…

Ranking near-native candidate protein structures via random forest classification

BACKGROUND: In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wu, Hongjie, Huang, Hongmei, Lu, Weizhong, Fu, Qiming, Ding, Yijie, Qiu, Jing, Li, Haiou
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929337/ https://www.ncbi.nlm.nih.gov/pubmed/31874596 http://dx.doi.org/10.1186/s12859-019-3257-8

_version_	1783482680747229184
author	Wu, Hongjie Huang, Hongmei Lu, Weizhong Fu, Qiming Ding, Yijie Qiu, Jing Li, Haiou
author_facet	Wu, Hongjie Huang, Hongmei Lu, Weizhong Fu, Qiming Ding, Yijie Qiu, Jing Li, Haiou
author_sort	Wu, Hongjie
collection	PubMed
description	BACKGROUND: In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. RESULTS: To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. CONCLUSIONS: In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.
format	Online Article Text
id	pubmed-6929337
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69293372019-12-30 Ranking near-native candidate protein structures via random forest classification Wu, Hongjie Huang, Hongmei Lu, Weizhong Fu, Qiming Ding, Yijie Qiu, Jing Li, Haiou BMC Bioinformatics Research BACKGROUND: In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. RESULTS: To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. CONCLUSIONS: In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods. BioMed Central 2019-12-24 /pmc/articles/PMC6929337/ /pubmed/31874596 http://dx.doi.org/10.1186/s12859-019-3257-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Wu, Hongjie Huang, Hongmei Lu, Weizhong Fu, Qiming Ding, Yijie Qiu, Jing Li, Haiou Ranking near-native candidate protein structures via random forest classification
title	Ranking near-native candidate protein structures via random forest classification
title_full	Ranking near-native candidate protein structures via random forest classification
title_fullStr	Ranking near-native candidate protein structures via random forest classification
title_full_unstemmed	Ranking near-native candidate protein structures via random forest classification
title_short	Ranking near-native candidate protein structures via random forest classification
title_sort	ranking near-native candidate protein structures via random forest classification
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929337/ https://www.ncbi.nlm.nih.gov/pubmed/31874596 http://dx.doi.org/10.1186/s12859-019-3257-8
work_keys_str_mv	AT wuhongjie rankingnearnativecandidateproteinstructuresviarandomforestclassification AT huanghongmei rankingnearnativecandidateproteinstructuresviarandomforestclassification AT luweizhong rankingnearnativecandidateproteinstructuresviarandomforestclassification AT fuqiming rankingnearnativecandidateproteinstructuresviarandomforestclassification AT dingyijie rankingnearnativecandidateproteinstructuresviarandomforestclassification AT qiujing rankingnearnativecandidateproteinstructuresviarandomforestclassification AT lihaiou rankingnearnativecandidateproteinstructuresviarandomforestclassification

Ranking near-native candidate protein structures via random forest classification

Ejemplares similares