Cargando…
Identify High-Quality Protein Structural Models by Enhanced K-Means
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; ho...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5381204/ https://www.ncbi.nlm.nih.gov/pubmed/28421198 http://dx.doi.org/10.1155/2017/7294519 |
_version_ | 1782519892416659456 |
---|---|
author | Wu, Hongjie Li, Haiou Jiang, Min Chen, Cheng Lv, Qiang Wu, Chuang |
author_facet | Wu, Hongjie Li, Haiou Jiang, Min Chen, Cheng Lv, Qiang Wu, Chuang |
author_sort | Wu, Hongjie |
collection | PubMed |
description | Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. |
format | Online Article Text |
id | pubmed-5381204 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-53812042017-04-18 Identify High-Quality Protein Structural Models by Enhanced K-Means Wu, Hongjie Li, Haiou Jiang, Min Chen, Cheng Lv, Qiang Wu, Chuang Biomed Res Int Research Article Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. Hindawi 2017 2017-03-22 /pmc/articles/PMC5381204/ /pubmed/28421198 http://dx.doi.org/10.1155/2017/7294519 Text en Copyright © 2017 Hongjie Wu et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Wu, Hongjie Li, Haiou Jiang, Min Chen, Cheng Lv, Qiang Wu, Chuang Identify High-Quality Protein Structural Models by Enhanced K-Means |
title | Identify High-Quality Protein Structural Models by Enhanced K-Means |
title_full | Identify High-Quality Protein Structural Models by Enhanced K-Means |
title_fullStr | Identify High-Quality Protein Structural Models by Enhanced K-Means |
title_full_unstemmed | Identify High-Quality Protein Structural Models by Enhanced K-Means |
title_short | Identify High-Quality Protein Structural Models by Enhanced K-Means |
title_sort | identify high-quality protein structural models by enhanced k-means |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5381204/ https://www.ncbi.nlm.nih.gov/pubmed/28421198 http://dx.doi.org/10.1155/2017/7294519 |
work_keys_str_mv | AT wuhongjie identifyhighqualityproteinstructuralmodelsbyenhancedkmeans AT lihaiou identifyhighqualityproteinstructuralmodelsbyenhancedkmeans AT jiangmin identifyhighqualityproteinstructuralmodelsbyenhancedkmeans AT chencheng identifyhighqualityproteinstructuralmodelsbyenhancedkmeans AT lvqiang identifyhighqualityproteinstructuralmodelsbyenhancedkmeans AT wuchuang identifyhighqualityproteinstructuralmodelsbyenhancedkmeans |