Cargando…

Identify High-Quality Protein Structural Models by Enhanced K-Means

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Hongjie, Li, Haiou, Jiang, Min, Chen, Cheng, Lv, Qiang, Wu, Chuang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5381204/
https://www.ncbi.nlm.nih.gov/pubmed/28421198
http://dx.doi.org/10.1155/2017/7294519
_version_ 1782519892416659456
author Wu, Hongjie
Li, Haiou
Jiang, Min
Chen, Cheng
Lv, Qiang
Wu, Chuang
author_facet Wu, Hongjie
Li, Haiou
Jiang, Min
Chen, Cheng
Lv, Qiang
Wu, Chuang
author_sort Wu, Hongjie
collection PubMed
description Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means.
format Online
Article
Text
id pubmed-5381204
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-53812042017-04-18 Identify High-Quality Protein Structural Models by Enhanced K-Means Wu, Hongjie Li, Haiou Jiang, Min Chen, Cheng Lv, Qiang Wu, Chuang Biomed Res Int Research Article Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. Hindawi 2017 2017-03-22 /pmc/articles/PMC5381204/ /pubmed/28421198 http://dx.doi.org/10.1155/2017/7294519 Text en Copyright © 2017 Hongjie Wu et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wu, Hongjie
Li, Haiou
Jiang, Min
Chen, Cheng
Lv, Qiang
Wu, Chuang
Identify High-Quality Protein Structural Models by Enhanced K-Means
title Identify High-Quality Protein Structural Models by Enhanced K-Means
title_full Identify High-Quality Protein Structural Models by Enhanced K-Means
title_fullStr Identify High-Quality Protein Structural Models by Enhanced K-Means
title_full_unstemmed Identify High-Quality Protein Structural Models by Enhanced K-Means
title_short Identify High-Quality Protein Structural Models by Enhanced K-Means
title_sort identify high-quality protein structural models by enhanced k-means
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5381204/
https://www.ncbi.nlm.nih.gov/pubmed/28421198
http://dx.doi.org/10.1155/2017/7294519
work_keys_str_mv AT wuhongjie identifyhighqualityproteinstructuralmodelsbyenhancedkmeans
AT lihaiou identifyhighqualityproteinstructuralmodelsbyenhancedkmeans
AT jiangmin identifyhighqualityproteinstructuralmodelsbyenhancedkmeans
AT chencheng identifyhighqualityproteinstructuralmodelsbyenhancedkmeans
AT lvqiang identifyhighqualityproteinstructuralmodelsbyenhancedkmeans
AT wuchuang identifyhighqualityproteinstructuralmodelsbyenhancedkmeans