Cargando…
Fast 3D shape screening of large chemical databases through alignment-recycling
BACKGROUND: Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994057/ https://www.ncbi.nlm.nih.gov/pubmed/17880744 http://dx.doi.org/10.1186/1752-153X-1-12 |
_version_ | 1782135478071328768 |
---|---|
author | Fontaine, Fabien Bolton, Evan Borodina, Yulia Bryant, Stephen H |
author_facet | Fontaine, Fabien Bolton, Evan Borodina, Yulia Bryant, Stephen H |
author_sort | Fontaine, Fabien |
collection | PubMed |
description | BACKGROUND: Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. RESULTS: Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. CONCLUSION: Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. |
format | Text |
id | pubmed-1994057 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-19940572007-09-25 Fast 3D shape screening of large chemical databases through alignment-recycling Fontaine, Fabien Bolton, Evan Borodina, Yulia Bryant, Stephen H Chem Cent J Methodology BACKGROUND: Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. RESULTS: Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. CONCLUSION: Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. BioMed Central 2007-06-06 /pmc/articles/PMC1994057/ /pubmed/17880744 http://dx.doi.org/10.1186/1752-153X-1-12 Text en Copyright © 2007 Fontaine et al http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Fontaine, Fabien Bolton, Evan Borodina, Yulia Bryant, Stephen H Fast 3D shape screening of large chemical databases through alignment-recycling |
title | Fast 3D shape screening of large chemical databases through alignment-recycling |
title_full | Fast 3D shape screening of large chemical databases through alignment-recycling |
title_fullStr | Fast 3D shape screening of large chemical databases through alignment-recycling |
title_full_unstemmed | Fast 3D shape screening of large chemical databases through alignment-recycling |
title_short | Fast 3D shape screening of large chemical databases through alignment-recycling |
title_sort | fast 3d shape screening of large chemical databases through alignment-recycling |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994057/ https://www.ncbi.nlm.nih.gov/pubmed/17880744 http://dx.doi.org/10.1186/1752-153X-1-12 |
work_keys_str_mv | AT fontainefabien fast3dshapescreeningoflargechemicaldatabasesthroughalignmentrecycling AT boltonevan fast3dshapescreeningoflargechemicaldatabasesthroughalignmentrecycling AT borodinayulia fast3dshapescreeningoflargechemicaldatabasesthroughalignmentrecycling AT bryantstephenh fast3dshapescreeningoflargechemicaldatabasesthroughalignmentrecycling |