Cargando…

Distributed Representation of Chemical Fragments

[Image: see text] This article describes an unsupervised machine learning method for computing distributed vector representation of molecular fragments. These vectors encode fragment features in a continuous high-dimensional space and enable similarity computation between individual fragments, even...

Descripción completa

Detalles Bibliográficos
Autor principal: Chakravarti, Suman K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2018
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044751/
https://www.ncbi.nlm.nih.gov/pubmed/30023852
http://dx.doi.org/10.1021/acsomega.7b02045
_version_ 1783339533756006400
author Chakravarti, Suman K.
author_facet Chakravarti, Suman K.
author_sort Chakravarti, Suman K.
collection PubMed
description [Image: see text] This article describes an unsupervised machine learning method for computing distributed vector representation of molecular fragments. These vectors encode fragment features in a continuous high-dimensional space and enable similarity computation between individual fragments, even for small fragments with only two heavy atoms. The method is based on a word embedding algorithm borrowed from natural language processing field, and approximately 6 million unlabeled PubChem chemicals were used for training. The resulting dense fragment vectors are in contrast to the traditional sparse “one-hot” fragment representation and capture rich relational structure in the fragment space. The vectors of small linear fragments were averaged to yield distributed vectors of bigger fragments and molecules, which were used for different tasks, e.g., clustering, ligand recall, and quantitative structure–activity relationship modeling. The distributed vectors were found to be better at clustering ring systems and recall of kinase ligands as compared to standard binary fingerprints. This work demonstrates unsupervised learning of fragment chemistry from large sets of unlabeled chemical structures and subsequent application to supervised training on relatively small data sets of labeled chemicals.
format Online
Article
Text
id pubmed-6044751
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-60447512018-07-16 Distributed Representation of Chemical Fragments Chakravarti, Suman K. ACS Omega [Image: see text] This article describes an unsupervised machine learning method for computing distributed vector representation of molecular fragments. These vectors encode fragment features in a continuous high-dimensional space and enable similarity computation between individual fragments, even for small fragments with only two heavy atoms. The method is based on a word embedding algorithm borrowed from natural language processing field, and approximately 6 million unlabeled PubChem chemicals were used for training. The resulting dense fragment vectors are in contrast to the traditional sparse “one-hot” fragment representation and capture rich relational structure in the fragment space. The vectors of small linear fragments were averaged to yield distributed vectors of bigger fragments and molecules, which were used for different tasks, e.g., clustering, ligand recall, and quantitative structure–activity relationship modeling. The distributed vectors were found to be better at clustering ring systems and recall of kinase ligands as compared to standard binary fingerprints. This work demonstrates unsupervised learning of fragment chemistry from large sets of unlabeled chemical structures and subsequent application to supervised training on relatively small data sets of labeled chemicals. American Chemical Society 2018-03-08 /pmc/articles/PMC6044751/ /pubmed/30023852 http://dx.doi.org/10.1021/acsomega.7b02045 Text en Copyright © 2018 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Chakravarti, Suman K.
Distributed Representation of Chemical Fragments
title Distributed Representation of Chemical Fragments
title_full Distributed Representation of Chemical Fragments
title_fullStr Distributed Representation of Chemical Fragments
title_full_unstemmed Distributed Representation of Chemical Fragments
title_short Distributed Representation of Chemical Fragments
title_sort distributed representation of chemical fragments
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044751/
https://www.ncbi.nlm.nih.gov/pubmed/30023852
http://dx.doi.org/10.1021/acsomega.7b02045
work_keys_str_mv AT chakravartisumank distributedrepresentationofchemicalfragments