Cargando…

Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning

Fingerprint (FP) representations of chemical structure continue to be one of the most widely used types of molecular descriptors in chemoinformatics and computational medicinal chemistry. One often distinguishes between two- and three-dimensional (2D and 3D) FPs depending on whether they are derived...

Descripción completa

Detalles Bibliográficos
Autores principales: Janela, Tiago, Takeuchi, Kosuke, Bajorath, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000322/
https://www.ncbi.nlm.nih.gov/pubmed/35408730
http://dx.doi.org/10.3390/molecules27072331
_version_ 1784685406457954304
author Janela, Tiago
Takeuchi, Kosuke
Bajorath, Jürgen
author_facet Janela, Tiago
Takeuchi, Kosuke
Bajorath, Jürgen
author_sort Janela, Tiago
collection PubMed
description Fingerprint (FP) representations of chemical structure continue to be one of the most widely used types of molecular descriptors in chemoinformatics and computational medicinal chemistry. One often distinguishes between two- and three-dimensional (2D and 3D) FPs depending on whether they are derived from molecular graphs or conformations, respectively. Primary application areas for FPs include similarity searching and compound classification via machine learning, especially for hit identification. For these applications, 2D FPs are particularly popular, given their robustness and for the most part comparable (or better) performance to 3D FPs. While a variety of FP prototypes has been designed and evaluated during earlier times of chemoinformatics research, new developments have been rare over the past decade. At least in part, this has been due to the situation that topological (atom environment) FPs derived from molecular graphs have evolved as a gold standard in the field. We were interested in exploring the question of whether the amount of structural information captured by state-of-the-art 2D FPs is indeed required for effective similarity searching and compound classification or whether accounting for fewer structural features might be sufficient. Therefore, pursuing a “structural minimalist” approach, we designed and implemented a new 2D FP based upon ring and substituent fragments obtained by systematically decomposing large numbers of compounds from medicinal chemistry. The resulting FP termed core-substituent FP (CSFP) captures much smaller numbers of structural features than state-of-the-art 2D FPs. However, CSFP achieves high performance in similarity searching and machine learning, demonstrating that less structural information is required for establishing molecular similarity relationships than is often believed. Given its high performance and chemical tangibility, CSFP is also relevant for practical applications in medicinal chemistry.
format Online
Article
Text
id pubmed-9000322
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90003222022-04-12 Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning Janela, Tiago Takeuchi, Kosuke Bajorath, Jürgen Molecules Article Fingerprint (FP) representations of chemical structure continue to be one of the most widely used types of molecular descriptors in chemoinformatics and computational medicinal chemistry. One often distinguishes between two- and three-dimensional (2D and 3D) FPs depending on whether they are derived from molecular graphs or conformations, respectively. Primary application areas for FPs include similarity searching and compound classification via machine learning, especially for hit identification. For these applications, 2D FPs are particularly popular, given their robustness and for the most part comparable (or better) performance to 3D FPs. While a variety of FP prototypes has been designed and evaluated during earlier times of chemoinformatics research, new developments have been rare over the past decade. At least in part, this has been due to the situation that topological (atom environment) FPs derived from molecular graphs have evolved as a gold standard in the field. We were interested in exploring the question of whether the amount of structural information captured by state-of-the-art 2D FPs is indeed required for effective similarity searching and compound classification or whether accounting for fewer structural features might be sufficient. Therefore, pursuing a “structural minimalist” approach, we designed and implemented a new 2D FP based upon ring and substituent fragments obtained by systematically decomposing large numbers of compounds from medicinal chemistry. The resulting FP termed core-substituent FP (CSFP) captures much smaller numbers of structural features than state-of-the-art 2D FPs. However, CSFP achieves high performance in similarity searching and machine learning, demonstrating that less structural information is required for establishing molecular similarity relationships than is often believed. Given its high performance and chemical tangibility, CSFP is also relevant for practical applications in medicinal chemistry. MDPI 2022-04-04 /pmc/articles/PMC9000322/ /pubmed/35408730 http://dx.doi.org/10.3390/molecules27072331 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Janela, Tiago
Takeuchi, Kosuke
Bajorath, Jürgen
Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning
title Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning
title_full Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning
title_fullStr Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning
title_full_unstemmed Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning
title_short Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning
title_sort introducing a chemically intuitive core-substituent fingerprint designed to explore structural requirements for effective similarity searching and machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000322/
https://www.ncbi.nlm.nih.gov/pubmed/35408730
http://dx.doi.org/10.3390/molecules27072331
work_keys_str_mv AT janelatiago introducingachemicallyintuitivecoresubstituentfingerprintdesignedtoexplorestructuralrequirementsforeffectivesimilaritysearchingandmachinelearning
AT takeuchikosuke introducingachemicallyintuitivecoresubstituentfingerprintdesignedtoexplorestructuralrequirementsforeffectivesimilaritysearchingandmachinelearning
AT bajorathjurgen introducingachemicallyintuitivecoresubstituentfingerprintdesignedtoexplorestructuralrequirementsforeffectivesimilaritysearchingandmachinelearning