Cargando…

PubChem3D: Biologically relevant 3-D similarity

BACKGROUND: The use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between "active/active" and "active/inactive" spac...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunghwan, Bolton, Evan E, Bryant, Stephen H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223603/
https://www.ncbi.nlm.nih.gov/pubmed/21781288
http://dx.doi.org/10.1186/1758-2946-3-26
_version_ 1782217305970704384
author Kim, Sunghwan
Bolton, Evan E
Bryant, Stephen H
author_facet Kim, Sunghwan
Bolton, Evan E
Bryant, Stephen H
author_sort Kim, Sunghwan
collection PubMed
description BACKGROUND: The use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between "active/active" and "active/inactive" spaces? These questions are explored using 734,486 biologically tested chemical structures, 1,389 biological assay data sets, and six different 3-D similarity types utilized by PubChem analysis tools. RESULTS: The similarity value distributions of 269.7 billion unique conformer pairs from 734,486 biologically tested compounds (all-against-all) from PubChem were utilized to help work towards an answer to the question: what is a biologically meaningful 3-D similarity score? The average and standard deviation for the six similarity measures ST(ST-opt), CT(ST-opt), ComboT(ST-opt), ST(CT-opt), CT(CT-opt), and ComboT(CT-opt )were 0.54 ± 0.10, 0.07 ± 0.05, 0.62 ± 0.13, 0.41 ± 0.11, 0.18 ± 0.06, and 0.59 ± 0.14, respectively. Considering that this random distribution of biologically tested compounds was constructed using a single theoretical conformer per compound (the "default" conformer provided by PubChem), further study may be necessary using multiple diverse conformers per compound; however, given the breadth of the compound set, the single conformer per compound results may still apply to the case of multi-conformer per compound 3-D similarity value distributions. As such, this work is a critical step, covering a very wide corpus of chemical structures and biological assays, creating a statistical framework to build upon. The second part of this study explored the question of whether it was possible to realize a statistically meaningful 3-D similarity value separation between reputed biological assay "inactives" and "actives". Using the terminology of noninactive-noninactive (NN) pairs and the noninactive-inactive (NI) pairs to represent comparison of the "active/active" and "active/inactive" spaces, respectively, each of the 1,389 biological assays was examined by their 3-D similarity score differences between the NN and NI pairs and analyzed across all assays and by assay category types. While a consistent trend of separation was observed, this result was not statistically unambiguous after considering the respective standard deviations. While not all "actives" in a biological assay are amenable to this type of analysis, e.g., due to different mechanisms of action or binding configurations, the ambiguous separation may also be due to employing a single conformer per compound in this study. With that said, there were a subset of biological assays where a clear separation between the NN and NI pairs found. In addition, use of combo Tanimoto (ComboT) alone, independent of superposition optimization type, appears to be the most efficient 3-D score type in identifying these cases. CONCLUSION: This study provides a statistical guideline for analyzing biological assay data in terms of 3-D similarity and PubChem structure-activity analysis tools. When using a single conformer per compound, a relatively small number of assays appear to be able to separate "active/active" space from "active/inactive" space.
format Online
Article
Text
id pubmed-3223603
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32236032011-11-26 PubChem3D: Biologically relevant 3-D similarity Kim, Sunghwan Bolton, Evan E Bryant, Stephen H J Cheminform Research Article BACKGROUND: The use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between "active/active" and "active/inactive" spaces? These questions are explored using 734,486 biologically tested chemical structures, 1,389 biological assay data sets, and six different 3-D similarity types utilized by PubChem analysis tools. RESULTS: The similarity value distributions of 269.7 billion unique conformer pairs from 734,486 biologically tested compounds (all-against-all) from PubChem were utilized to help work towards an answer to the question: what is a biologically meaningful 3-D similarity score? The average and standard deviation for the six similarity measures ST(ST-opt), CT(ST-opt), ComboT(ST-opt), ST(CT-opt), CT(CT-opt), and ComboT(CT-opt )were 0.54 ± 0.10, 0.07 ± 0.05, 0.62 ± 0.13, 0.41 ± 0.11, 0.18 ± 0.06, and 0.59 ± 0.14, respectively. Considering that this random distribution of biologically tested compounds was constructed using a single theoretical conformer per compound (the "default" conformer provided by PubChem), further study may be necessary using multiple diverse conformers per compound; however, given the breadth of the compound set, the single conformer per compound results may still apply to the case of multi-conformer per compound 3-D similarity value distributions. As such, this work is a critical step, covering a very wide corpus of chemical structures and biological assays, creating a statistical framework to build upon. The second part of this study explored the question of whether it was possible to realize a statistically meaningful 3-D similarity value separation between reputed biological assay "inactives" and "actives". Using the terminology of noninactive-noninactive (NN) pairs and the noninactive-inactive (NI) pairs to represent comparison of the "active/active" and "active/inactive" spaces, respectively, each of the 1,389 biological assays was examined by their 3-D similarity score differences between the NN and NI pairs and analyzed across all assays and by assay category types. While a consistent trend of separation was observed, this result was not statistically unambiguous after considering the respective standard deviations. While not all "actives" in a biological assay are amenable to this type of analysis, e.g., due to different mechanisms of action or binding configurations, the ambiguous separation may also be due to employing a single conformer per compound in this study. With that said, there were a subset of biological assays where a clear separation between the NN and NI pairs found. In addition, use of combo Tanimoto (ComboT) alone, independent of superposition optimization type, appears to be the most efficient 3-D score type in identifying these cases. CONCLUSION: This study provides a statistical guideline for analyzing biological assay data in terms of 3-D similarity and PubChem structure-activity analysis tools. When using a single conformer per compound, a relatively small number of assays appear to be able to separate "active/active" space from "active/inactive" space. BioMed Central 2011-07-22 /pmc/articles/PMC3223603/ /pubmed/21781288 http://dx.doi.org/10.1186/1758-2946-3-26 Text en Copyright ©2011 Kim et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kim, Sunghwan
Bolton, Evan E
Bryant, Stephen H
PubChem3D: Biologically relevant 3-D similarity
title PubChem3D: Biologically relevant 3-D similarity
title_full PubChem3D: Biologically relevant 3-D similarity
title_fullStr PubChem3D: Biologically relevant 3-D similarity
title_full_unstemmed PubChem3D: Biologically relevant 3-D similarity
title_short PubChem3D: Biologically relevant 3-D similarity
title_sort pubchem3d: biologically relevant 3-d similarity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223603/
https://www.ncbi.nlm.nih.gov/pubmed/21781288
http://dx.doi.org/10.1186/1758-2946-3-26
work_keys_str_mv AT kimsunghwan pubchem3dbiologicallyrelevant3dsimilarity
AT boltonevane pubchem3dbiologicallyrelevant3dsimilarity
AT bryantstephenh pubchem3dbiologicallyrelevant3dsimilarity