Cargando…

Sachem: a chemical cartridge for high-performance substructure search

BACKGROUND: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to...

Descripción completa

Detalles Bibliográficos
Autores principales: Kratochvíl, Miroslav, Vondrášek, Jiří, Galgonek, Jakub
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966370/
https://www.ncbi.nlm.nih.gov/pubmed/29797000
http://dx.doi.org/10.1186/s13321-018-0282-y
_version_ 1783325442001862656
author Kratochvíl, Miroslav
Vondrášek, Jiří
Galgonek, Jakub
author_facet Kratochvíl, Miroslav
Vondrášek, Jiří
Galgonek, Jakub
author_sort Kratochvíl, Miroslav
collection PubMed
description BACKGROUND: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
format Online
Article
Text
id pubmed-5966370
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-59663702018-06-05 Sachem: a chemical cartridge for high-performance substructure search Kratochvíl, Miroslav Vondrášek, Jiří Galgonek, Jakub J Cheminform Software BACKGROUND: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches. Springer International Publishing 2018-05-23 /pmc/articles/PMC5966370/ /pubmed/29797000 http://dx.doi.org/10.1186/s13321-018-0282-y Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Kratochvíl, Miroslav
Vondrášek, Jiří
Galgonek, Jakub
Sachem: a chemical cartridge for high-performance substructure search
title Sachem: a chemical cartridge for high-performance substructure search
title_full Sachem: a chemical cartridge for high-performance substructure search
title_fullStr Sachem: a chemical cartridge for high-performance substructure search
title_full_unstemmed Sachem: a chemical cartridge for high-performance substructure search
title_short Sachem: a chemical cartridge for high-performance substructure search
title_sort sachem: a chemical cartridge for high-performance substructure search
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966370/
https://www.ncbi.nlm.nih.gov/pubmed/29797000
http://dx.doi.org/10.1186/s13321-018-0282-y
work_keys_str_mv AT kratochvilmiroslav sachemachemicalcartridgeforhighperformancesubstructuresearch
AT vondrasekjiri sachemachemicalcartridgeforhighperformancesubstructuresearch
AT galgonekjakub sachemachemicalcartridgeforhighperformancesubstructuresearch