Cargando…

ccbmlib – a Python package for modeling Tanimoto similarity value distributions

The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when...

Descripción completa

Detalles Bibliográficos
Autores principales: Vogt, Martin, Bajorath, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050271/
https://www.ncbi.nlm.nih.gov/pubmed/32161645
http://dx.doi.org/10.12688/f1000research.22292.2
_version_ 1783502599943618560
author Vogt, Martin
Bajorath, Jürgen
author_facet Vogt, Martin
Bajorath, Jürgen
author_sort Vogt, Martin
collection PubMed
description The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.
format Online
Article
Text
id pubmed-7050271
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-70502712020-03-10 ccbmlib – a Python package for modeling Tanimoto similarity value distributions Vogt, Martin Bajorath, Jürgen F1000Res Software Tool Article The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity. F1000 Research Limited 2020-03-05 /pmc/articles/PMC7050271/ /pubmed/32161645 http://dx.doi.org/10.12688/f1000research.22292.2 Text en Copyright: © 2020 Vogt M and Bajorath J http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Vogt, Martin
Bajorath, Jürgen
ccbmlib – a Python package for modeling Tanimoto similarity value distributions
title ccbmlib – a Python package for modeling Tanimoto similarity value distributions
title_full ccbmlib – a Python package for modeling Tanimoto similarity value distributions
title_fullStr ccbmlib – a Python package for modeling Tanimoto similarity value distributions
title_full_unstemmed ccbmlib – a Python package for modeling Tanimoto similarity value distributions
title_short ccbmlib – a Python package for modeling Tanimoto similarity value distributions
title_sort ccbmlib – a python package for modeling tanimoto similarity value distributions
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050271/
https://www.ncbi.nlm.nih.gov/pubmed/32161645
http://dx.doi.org/10.12688/f1000research.22292.2
work_keys_str_mv AT vogtmartin ccbmlibapythonpackageformodelingtanimotosimilarityvaluedistributions
AT bajorathjurgen ccbmlibapythonpackageformodelingtanimotosimilarityvaluedistributions