Cargando…

Improving VAE based molecular representations for compound property prediction

Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the...

Descripción completa

Detalles Bibliográficos
Autores principales: Tevosyan, Ani, Khondkaryan, Lusine, Khachatrian, Hrant, Tadevosyan, Gohar, Apresyan, Lilit, Babayan, Nelly, Stopper, Helga, Navoyan, Zaven
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9569108/
https://www.ncbi.nlm.nih.gov/pubmed/36242073
http://dx.doi.org/10.1186/s13321-022-00648-x
_version_ 1784809790708383744
author Tevosyan, Ani
Khondkaryan, Lusine
Khachatrian, Hrant
Tadevosyan, Gohar
Apresyan, Lilit
Babayan, Nelly
Stopper, Helga
Navoyan, Zaven
author_facet Tevosyan, Ani
Khondkaryan, Lusine
Khachatrian, Hrant
Tadevosyan, Gohar
Apresyan, Lilit
Babayan, Nelly
Stopper, Helga
Navoyan, Zaven
author_sort Tevosyan, Ani
collection PubMed
description Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction tasks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00648-x.
format Online
Article
Text
id pubmed-9569108
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-95691082022-10-16 Improving VAE based molecular representations for compound property prediction Tevosyan, Ani Khondkaryan, Lusine Khachatrian, Hrant Tadevosyan, Gohar Apresyan, Lilit Babayan, Nelly Stopper, Helga Navoyan, Zaven J Cheminform Research Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction tasks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00648-x. Springer International Publishing 2022-10-14 /pmc/articles/PMC9569108/ /pubmed/36242073 http://dx.doi.org/10.1186/s13321-022-00648-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Tevosyan, Ani
Khondkaryan, Lusine
Khachatrian, Hrant
Tadevosyan, Gohar
Apresyan, Lilit
Babayan, Nelly
Stopper, Helga
Navoyan, Zaven
Improving VAE based molecular representations for compound property prediction
title Improving VAE based molecular representations for compound property prediction
title_full Improving VAE based molecular representations for compound property prediction
title_fullStr Improving VAE based molecular representations for compound property prediction
title_full_unstemmed Improving VAE based molecular representations for compound property prediction
title_short Improving VAE based molecular representations for compound property prediction
title_sort improving vae based molecular representations for compound property prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9569108/
https://www.ncbi.nlm.nih.gov/pubmed/36242073
http://dx.doi.org/10.1186/s13321-022-00648-x
work_keys_str_mv AT tevosyanani improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT khondkaryanlusine improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT khachatrianhrant improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT tadevosyangohar improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT apresyanlilit improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT babayannelly improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT stopperhelga improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction
AT navoyanzaven improvingvaebasedmolecularrepresentationsforcompoundpropertyprediction