Cargando…

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix

BACKGROUND: The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, howe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Hyoungrae, Jang, Cheongyun, Yadav, Dharmendra K., Kim, Mi-hyun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2017
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5364127/ https://www.ncbi.nlm.nih.gov/pubmed/29086188 http://dx.doi.org/10.1186/s13321-017-0208-0

_version_	1782517262982316032
author	Kim, Hyoungrae Jang, Cheongyun Yadav, Dharmendra K. Kim, Mi-hyun
author_facet	Kim, Hyoungrae Jang, Cheongyun Yadav, Dharmendra K. Kim, Mi-hyun
author_sort	Kim, Hyoungrae
collection	PubMed
description	BACKGROUND: The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. RESULTS: In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. CONCLUSIONS: Dunn index, Davies–Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14–19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0208-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5364127
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-53641272017-04-10 The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix Kim, Hyoungrae Jang, Cheongyun Yadav, Dharmendra K. Kim, Mi-hyun J Cheminform Methodology BACKGROUND: The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. RESULTS: In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. CONCLUSIONS: Dunn index, Davies–Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14–19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0208-0) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-03-23 /pmc/articles/PMC5364127/ /pubmed/29086188 http://dx.doi.org/10.1186/s13321-017-0208-0 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Kim, Hyoungrae Jang, Cheongyun Yadav, Dharmendra K. Kim, Mi-hyun The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix
title	The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix
title_full	The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix
title_fullStr	The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix
title_full_unstemmed	The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix
title_short	The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix
title_sort	comparison of automated clustering algorithms for resampling representative conformer ensembles with rmsd matrix
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5364127/ https://www.ncbi.nlm.nih.gov/pubmed/29086188 http://dx.doi.org/10.1186/s13321-017-0208-0
work_keys_str_mv	AT kimhyoungrae thecomparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT jangcheongyun thecomparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT yadavdharmendrak thecomparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT kimmihyun thecomparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT kimhyoungrae comparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT jangcheongyun comparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT yadavdharmendrak comparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix AT kimmihyun comparisonofautomatedclusteringalgorithmsforresamplingrepresentativeconformerensembleswithrmsdmatrix

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix

Ejemplares similares