Cargando…

Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets

BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the Pub...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunghwan, Bolton, Evan E., Bryant, Stephen H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097428/
https://www.ncbi.nlm.nih.gov/pubmed/27872662
http://dx.doi.org/10.1186/s13321-016-0163-1
_version_ 1782465600513114112
author Kim, Sunghwan
Bolton, Evan E.
Bryant, Stephen H.
author_facet Kim, Sunghwan
Bolton, Evan E.
Bryant, Stephen H.
author_sort Kim, Sunghwan
collection PubMed
description BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called “Similar Compounds” and “Similar Conformers”, respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. RESULTS: The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). CONCLUSION: The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used. [Figure: see text]
format Online
Article
Text
id pubmed-5097428
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-50974282016-11-21 Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets Kim, Sunghwan Bolton, Evan E. Bryant, Stephen H. J Cheminform Research Article BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called “Similar Compounds” and “Similar Conformers”, respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. RESULTS: The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). CONCLUSION: The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used. [Figure: see text] Springer International Publishing 2016-11-04 /pmc/articles/PMC5097428/ /pubmed/27872662 http://dx.doi.org/10.1186/s13321-016-0163-1 Text en © U.S. Government 2016 COPYRIGHT NOTICE. The article is a work of the United States Government; Title 17 U.S.C 105 provides that copyright protection is not available for an work of the United States government in the United States. Additionally, this is an open access article distributed under the terms of the Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0), which permits worldwide unrestricted use, distribution, and reproduction in any medium for any lawful purpose.
spellingShingle Research Article
Kim, Sunghwan
Bolton, Evan E.
Bryant, Stephen H.
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
title Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
title_full Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
title_fullStr Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
title_full_unstemmed Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
title_short Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
title_sort similar compounds versus similar conformers: complementarity between pubchem 2-d and 3-d neighboring sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097428/
https://www.ncbi.nlm.nih.gov/pubmed/27872662
http://dx.doi.org/10.1186/s13321-016-0163-1
work_keys_str_mv AT kimsunghwan similarcompoundsversussimilarconformerscomplementaritybetweenpubchem2dand3dneighboringsets
AT boltonevane similarcompoundsversussimilarconformerscomplementaritybetweenpubchem2dand3dneighboringsets
AT bryantstephenh similarcompoundsversussimilarconformerscomplementaritybetweenpubchem2dand3dneighboringsets