Cargando…
Comparing structural fingerprints using a literature-based similarity benchmark
BACKGROUND: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are s...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4932683/ https://www.ncbi.nlm.nih.gov/pubmed/27382417 http://dx.doi.org/10.1186/s13321-016-0148-0 |
_version_ | 1782441106992005120 |
---|---|
author | O’Boyle, Noel M. Sayle, Roger A. |
author_facet | O’Boyle, Noel M. Sayle, Roger A. |
author_sort | O’Boyle, Noel M. |
collection | PubMed |
description | BACKGROUND: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. RESULTS: Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. CONCLUSIONS: Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0148-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4932683 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-49326832016-07-06 Comparing structural fingerprints using a literature-based similarity benchmark O’Boyle, Noel M. Sayle, Roger A. J Cheminform Research Article BACKGROUND: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. RESULTS: Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. CONCLUSIONS: Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0148-0) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-07-05 /pmc/articles/PMC4932683/ /pubmed/27382417 http://dx.doi.org/10.1186/s13321-016-0148-0 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article O’Boyle, Noel M. Sayle, Roger A. Comparing structural fingerprints using a literature-based similarity benchmark |
title | Comparing structural fingerprints using a literature-based similarity benchmark |
title_full | Comparing structural fingerprints using a literature-based similarity benchmark |
title_fullStr | Comparing structural fingerprints using a literature-based similarity benchmark |
title_full_unstemmed | Comparing structural fingerprints using a literature-based similarity benchmark |
title_short | Comparing structural fingerprints using a literature-based similarity benchmark |
title_sort | comparing structural fingerprints using a literature-based similarity benchmark |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4932683/ https://www.ncbi.nlm.nih.gov/pubmed/27382417 http://dx.doi.org/10.1186/s13321-016-0148-0 |
work_keys_str_mv | AT oboylenoelm comparingstructuralfingerprintsusingaliteraturebasedsimilaritybenchmark AT saylerogera comparingstructuralfingerprintsusingaliteraturebasedsimilaritybenchmark |