Cargando…

One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome

BACKGROUND: Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecule...

Descripción completa

Detalles Bibliográficos
Autores principales: Capecchi, Alice, Probst, Daniel, Reymond, Jean-Louis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7291580/
https://www.ncbi.nlm.nih.gov/pubmed/33431010
http://dx.doi.org/10.1186/s13321-020-00445-4
_version_ 1783545935375106048
author Capecchi, Alice
Probst, Daniel
Reymond, Jean-Louis
author_facet Capecchi, Alice
Probst, Daniel
Reymond, Jean-Louis
author_sort Capecchi, Alice
collection PubMed
description BACKGROUND: Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. RESULTS: Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of r = 1 and r = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints. CONCLUSION: MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at https://github.com/reymond-group/map4 and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at http://map-search.gdb.tools/ and http://tm.gdb.tools/map4/. [Image: see text]
format Online
Article
Text
id pubmed-7291580
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-72915802020-06-12 One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome Capecchi, Alice Probst, Daniel Reymond, Jean-Louis J Cheminform Research Article BACKGROUND: Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. RESULTS: Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of r = 1 and r = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints. CONCLUSION: MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at https://github.com/reymond-group/map4 and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at http://map-search.gdb.tools/ and http://tm.gdb.tools/map4/. [Image: see text] Springer International Publishing 2020-06-12 /pmc/articles/PMC7291580/ /pubmed/33431010 http://dx.doi.org/10.1186/s13321-020-00445-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Capecchi, Alice
Probst, Daniel
Reymond, Jean-Louis
One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
title One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
title_full One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
title_fullStr One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
title_full_unstemmed One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
title_short One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
title_sort one molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7291580/
https://www.ncbi.nlm.nih.gov/pubmed/33431010
http://dx.doi.org/10.1186/s13321-020-00445-4
work_keys_str_mv AT capecchialice onemolecularfingerprinttorulethemalldrugsbiomoleculesandthemetabolome
AT probstdaniel onemolecularfingerprinttorulethemalldrugsbiomoleculesandthemetabolome
AT reymondjeanlouis onemolecularfingerprinttorulethemalldrugsbiomoleculesandthemetabolome