Cargando…

Comparing structural fingerprints using a literature-based similarity benchmark

BACKGROUND: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are s...

Descripción completa

Detalles Bibliográficos
Autores principales: O’Boyle, Noel M., Sayle, Roger A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4932683/
https://www.ncbi.nlm.nih.gov/pubmed/27382417
http://dx.doi.org/10.1186/s13321-016-0148-0
_version_ 1782441106992005120
author O’Boyle, Noel M.
Sayle, Roger A.
author_facet O’Boyle, Noel M.
Sayle, Roger A.
author_sort O’Boyle, Noel M.
collection PubMed
description BACKGROUND: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. RESULTS: Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. CONCLUSIONS: Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0148-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4932683
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-49326832016-07-06 Comparing structural fingerprints using a literature-based similarity benchmark O’Boyle, Noel M. Sayle, Roger A. J Cheminform Research Article BACKGROUND: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. RESULTS: Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. CONCLUSIONS: Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0148-0) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-07-05 /pmc/articles/PMC4932683/ /pubmed/27382417 http://dx.doi.org/10.1186/s13321-016-0148-0 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
O’Boyle, Noel M.
Sayle, Roger A.
Comparing structural fingerprints using a literature-based similarity benchmark
title Comparing structural fingerprints using a literature-based similarity benchmark
title_full Comparing structural fingerprints using a literature-based similarity benchmark
title_fullStr Comparing structural fingerprints using a literature-based similarity benchmark
title_full_unstemmed Comparing structural fingerprints using a literature-based similarity benchmark
title_short Comparing structural fingerprints using a literature-based similarity benchmark
title_sort comparing structural fingerprints using a literature-based similarity benchmark
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4932683/
https://www.ncbi.nlm.nih.gov/pubmed/27382417
http://dx.doi.org/10.1186/s13321-016-0148-0
work_keys_str_mv AT oboylenoelm comparingstructuralfingerprintsusingaliteraturebasedsimilaritybenchmark
AT saylerogera comparingstructuralfingerprintsusingaliteraturebasedsimilaritybenchmark