Cargando…
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the Pub...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097428/ https://www.ncbi.nlm.nih.gov/pubmed/27872662 http://dx.doi.org/10.1186/s13321-016-0163-1 |
_version_ | 1782465600513114112 |
---|---|
author | Kim, Sunghwan Bolton, Evan E. Bryant, Stephen H. |
author_facet | Kim, Sunghwan Bolton, Evan E. Bryant, Stephen H. |
author_sort | Kim, Sunghwan |
collection | PubMed |
description | BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called “Similar Compounds” and “Similar Conformers”, respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. RESULTS: The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). CONCLUSION: The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used. [Figure: see text] |
format | Online Article Text |
id | pubmed-5097428 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-50974282016-11-21 Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets Kim, Sunghwan Bolton, Evan E. Bryant, Stephen H. J Cheminform Research Article BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called “Similar Compounds” and “Similar Conformers”, respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. RESULTS: The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). CONCLUSION: The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used. [Figure: see text] Springer International Publishing 2016-11-04 /pmc/articles/PMC5097428/ /pubmed/27872662 http://dx.doi.org/10.1186/s13321-016-0163-1 Text en © U.S. Government 2016 COPYRIGHT NOTICE. The article is a work of the United States Government; Title 17 U.S.C 105 provides that copyright protection is not available for an work of the United States government in the United States. Additionally, this is an open access article distributed under the terms of the Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0), which permits worldwide unrestricted use, distribution, and reproduction in any medium for any lawful purpose. |
spellingShingle | Research Article Kim, Sunghwan Bolton, Evan E. Bryant, Stephen H. Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets |
title | Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets |
title_full | Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets |
title_fullStr | Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets |
title_full_unstemmed | Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets |
title_short | Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets |
title_sort | similar compounds versus similar conformers: complementarity between pubchem 2-d and 3-d neighboring sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097428/ https://www.ncbi.nlm.nih.gov/pubmed/27872662 http://dx.doi.org/10.1186/s13321-016-0163-1 |
work_keys_str_mv | AT kimsunghwan similarcompoundsversussimilarconformerscomplementaritybetweenpubchem2dand3dneighboringsets AT boltonevane similarcompoundsversussimilarconformerscomplementaritybetweenpubchem2dand3dneighboringsets AT bryantstephenh similarcompoundsversussimilarconformerscomplementaritybetweenpubchem2dand3dneighboringsets |