Cargando…
Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2
BACKGROUND: Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecula...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/ https://www.ncbi.nlm.nih.gov/pubmed/22849361 http://dx.doi.org/10.1186/1758-2946-4-13 |
_version_ | 1782261379438215168 |
---|---|
author | Ehrlich, Hans-Christian Rarey, Matthias |
author_facet | Ehrlich, Hans-Christian Rarey, Matthias |
author_sort | Ehrlich, Hans-Christian |
collection | PubMed |
description | BACKGROUND: Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets. RESULTS: In this paper, we present a systematic evaluation of Ullmann’s and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores. CONCLUSIONS: The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann’s algorithm. Additionally, Ullmann’s algorithm shows a surprising number of run time outliers. |
format | Online Article Text |
id | pubmed-3586954 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35869542013-03-08 Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 Ehrlich, Hans-Christian Rarey, Matthias J Cheminform Research Article BACKGROUND: Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets. RESULTS: In this paper, we present a systematic evaluation of Ullmann’s and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores. CONCLUSIONS: The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann’s algorithm. Additionally, Ullmann’s algorithm shows a surprising number of run time outliers. BioMed Central 2012-07-31 /pmc/articles/PMC3586954/ /pubmed/22849361 http://dx.doi.org/10.1186/1758-2946-4-13 Text en Copyright ©2012 Ehrlich and Rarey; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Ehrlich, Hans-Christian Rarey, Matthias Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 |
title | Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 |
title_full | Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 |
title_fullStr | Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 |
title_full_unstemmed | Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 |
title_short | Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 |
title_sort | systematic benchmark of substructure search in molecular graphs - from ullmann to vf2 |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/ https://www.ncbi.nlm.nih.gov/pubmed/22849361 http://dx.doi.org/10.1186/1758-2946-4-13 |
work_keys_str_mv | AT ehrlichhanschristian systematicbenchmarkofsubstructuresearchinmoleculargraphsfromullmanntovf2 AT rareymatthias systematicbenchmarkofsubstructuresearchinmoleculargraphsfromullmanntovf2 |