Cargando…

PubChem3D: Shape compatibility filtering using molecular shape quadrupoles

BACKGROUND: PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty t...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunghwan, Bolton, Evan E, Bryant, Stephen H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158422/
https://www.ncbi.nlm.nih.gov/pubmed/21774809
http://dx.doi.org/10.1186/1758-2946-3-25
_version_ 1782210374828818432
author Kim, Sunghwan
Bolton, Evan E
Bryant, Stephen H
author_facet Kim, Sunghwan
Bolton, Evan E
Bryant, Stephen H
author_sort Kim, Sunghwan
collection PubMed
description BACKGROUND: PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a series of pre-filters, based on the concept of volume, to remove approximately 65% of all conformer neighbor pairs prior to shape overlap optimization. Given that molecular volume, a somewhat vague concept, is rather effective, it leads one to wonder: can the existing PubChem 3-D neighboring relationship, which consists of billions of shape similar conformer pairs from tens of millions of unique small molecules, be used to identify additional shape descriptor relationships? Or, put more specifically, can one place an upper bound on shape similarity using other "fuzzy" shape-like concepts like length, width, and height? RESULTS: Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Q(x), Q(y), and Q(z)) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Q(x), Q(y), and Q(z)), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8. The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Q(x), Q(y), and Q(z )maps in a series (Q(xyz)). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Q(x )filter was consistently the most efficient followed by Q(y )and then by Q(z). Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume. Application of the monopole-based Q(xyz )filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead. CONCLUSION: Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound conformer pairs. When performing a 3-D search using a shape similarity cut-off, computation can be avoided by identifying conformer pairs that cannot meet the result criteria. Applying this methodology as a filter for PubChem 3-D neighboring computation, an improvement of 31% was realized, increasing the average conformer pair throughput from 154,000 to 202,000 per second per CPU core.
format Online
Article
Text
id pubmed-3158422
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31584222011-08-20 PubChem3D: Shape compatibility filtering using molecular shape quadrupoles Kim, Sunghwan Bolton, Evan E Bryant, Stephen H J Cheminform Research Article BACKGROUND: PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a series of pre-filters, based on the concept of volume, to remove approximately 65% of all conformer neighbor pairs prior to shape overlap optimization. Given that molecular volume, a somewhat vague concept, is rather effective, it leads one to wonder: can the existing PubChem 3-D neighboring relationship, which consists of billions of shape similar conformer pairs from tens of millions of unique small molecules, be used to identify additional shape descriptor relationships? Or, put more specifically, can one place an upper bound on shape similarity using other "fuzzy" shape-like concepts like length, width, and height? RESULTS: Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Q(x), Q(y), and Q(z)) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Q(x), Q(y), and Q(z)), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8. The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Q(x), Q(y), and Q(z )maps in a series (Q(xyz)). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Q(x )filter was consistently the most efficient followed by Q(y )and then by Q(z). Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume. Application of the monopole-based Q(xyz )filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead. CONCLUSION: Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound conformer pairs. When performing a 3-D search using a shape similarity cut-off, computation can be avoided by identifying conformer pairs that cannot meet the result criteria. Applying this methodology as a filter for PubChem 3-D neighboring computation, an improvement of 31% was realized, increasing the average conformer pair throughput from 154,000 to 202,000 per second per CPU core. BioMed Central 2011-07-20 /pmc/articles/PMC3158422/ /pubmed/21774809 http://dx.doi.org/10.1186/1758-2946-3-25 Text en Copyright ©2011 Kim et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kim, Sunghwan
Bolton, Evan E
Bryant, Stephen H
PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
title PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
title_full PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
title_fullStr PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
title_full_unstemmed PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
title_short PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
title_sort pubchem3d: shape compatibility filtering using molecular shape quadrupoles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158422/
https://www.ncbi.nlm.nih.gov/pubmed/21774809
http://dx.doi.org/10.1186/1758-2946-3-25
work_keys_str_mv AT kimsunghwan pubchem3dshapecompatibilityfilteringusingmolecularshapequadrupoles
AT boltonevane pubchem3dshapecompatibilityfilteringusingmolecularshapequadrupoles
AT bryantstephenh pubchem3dshapecompatibilityfilteringusingmolecularshapequadrupoles