Cargando…

Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2

BACKGROUND: Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecula...

Descripción completa

Detalles Bibliográficos
Autores principales: Ehrlich, Hans-Christian, Rarey, Matthias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/
https://www.ncbi.nlm.nih.gov/pubmed/22849361
http://dx.doi.org/10.1186/1758-2946-4-13
_version_ 1782261379438215168
author Ehrlich, Hans-Christian
Rarey, Matthias
author_facet Ehrlich, Hans-Christian
Rarey, Matthias
author_sort Ehrlich, Hans-Christian
collection PubMed
description BACKGROUND: Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets. RESULTS: In this paper, we present a systematic evaluation of Ullmann’s and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores. CONCLUSIONS: The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann’s algorithm. Additionally, Ullmann’s algorithm shows a surprising number of run time outliers.
format Online
Article
Text
id pubmed-3586954
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35869542013-03-08 Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 Ehrlich, Hans-Christian Rarey, Matthias J Cheminform Research Article BACKGROUND: Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets. RESULTS: In this paper, we present a systematic evaluation of Ullmann’s and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores. CONCLUSIONS: The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann’s algorithm. Additionally, Ullmann’s algorithm shows a surprising number of run time outliers. BioMed Central 2012-07-31 /pmc/articles/PMC3586954/ /pubmed/22849361 http://dx.doi.org/10.1186/1758-2946-4-13 Text en Copyright ©2012 Ehrlich and Rarey; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ehrlich, Hans-Christian
Rarey, Matthias
Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
title Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
title_full Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
title_fullStr Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
title_full_unstemmed Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
title_short Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
title_sort systematic benchmark of substructure search in molecular graphs - from ullmann to vf2
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/
https://www.ncbi.nlm.nih.gov/pubmed/22849361
http://dx.doi.org/10.1186/1758-2946-4-13
work_keys_str_mv AT ehrlichhanschristian systematicbenchmarkofsubstructuresearchinmoleculargraphsfromullmanntovf2
AT rareymatthias systematicbenchmarkofsubstructuresearchinmoleculargraphsfromullmanntovf2