Cargando…

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance w...

Descripción completa

Detalles Bibliográficos
Autores principales: Shrivastava, Aditya Divyakant, Kell, Douglas B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8038408/
https://www.ncbi.nlm.nih.gov/pubmed/33916824
http://dx.doi.org/10.3390/molecules26072065
_version_ 1783677368594857984
author Shrivastava, Aditya Divyakant
Kell, Douglas B.
author_facet Shrivastava, Aditya Divyakant
Kell, Douglas B.
author_sort Shrivastava, Aditya Divyakant
collection PubMed
description The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing “similar” molecules to cluster together in an effective and interpretable way.
format Online
Article
Text
id pubmed-8038408
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80384082021-04-12 FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space Shrivastava, Aditya Divyakant Kell, Douglas B. Molecules Article The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing “similar” molecules to cluster together in an effective and interpretable way. MDPI 2021-04-03 /pmc/articles/PMC8038408/ /pubmed/33916824 http://dx.doi.org/10.3390/molecules26072065 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Shrivastava, Aditya Divyakant
Kell, Douglas B.
FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
title FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
title_full FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
title_fullStr FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
title_full_unstemmed FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
title_short FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
title_sort fragnet, a contrastive learning-based transformer model for clustering, interpreting, visualizing, and navigating chemical space
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8038408/
https://www.ncbi.nlm.nih.gov/pubmed/33916824
http://dx.doi.org/10.3390/molecules26072065
work_keys_str_mv AT shrivastavaadityadivyakant fragnetacontrastivelearningbasedtransformermodelforclusteringinterpretingvisualizingandnavigatingchemicalspace
AT kelldouglasb fragnetacontrastivelearningbasedtransformermodelforclusteringinterpretingvisualizingandnavigatingchemicalspace