Cargando…

Analyzing Learned Molecular Representations for Property Prediction

[Image: see text] Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and gra...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Kevin, Swanson, Kyle, Jin, Wengong, Coley, Connor, Eiden, Philipp, Gao, Hua, Guzman-Perez, Angel, Hopper, Timothy, Kelley, Brian, Mathea, Miriam, Palmer, Andrew, Settels, Volker, Jaakkola, Tommi, Jensen, Klavs, Barzilay, Regina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2019
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6727618/
https://www.ncbi.nlm.nih.gov/pubmed/31361484
http://dx.doi.org/10.1021/acs.jcim.9b00237
_version_ 1783449291904253952
author Yang, Kevin
Swanson, Kyle
Jin, Wengong
Coley, Connor
Eiden, Philipp
Gao, Hua
Guzman-Perez, Angel
Hopper, Timothy
Kelley, Brian
Mathea, Miriam
Palmer, Andrew
Settels, Volker
Jaakkola, Tommi
Jensen, Klavs
Barzilay, Regina
author_facet Yang, Kevin
Swanson, Kyle
Jin, Wengong
Coley, Connor
Eiden, Philipp
Gao, Hua
Guzman-Perez, Angel
Hopper, Timothy
Kelley, Brian
Mathea, Miriam
Palmer, Andrew
Settels, Volker
Jaakkola, Tommi
Jensen, Klavs
Barzilay, Regina
author_sort Yang, Kevin
collection PubMed
description [Image: see text] Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
format Online
Article
Text
id pubmed-6727618
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-67276182019-09-06 Analyzing Learned Molecular Representations for Property Prediction Yang, Kevin Swanson, Kyle Jin, Wengong Coley, Connor Eiden, Philipp Gao, Hua Guzman-Perez, Angel Hopper, Timothy Kelley, Brian Mathea, Miriam Palmer, Andrew Settels, Volker Jaakkola, Tommi Jensen, Klavs Barzilay, Regina J Chem Inf Model [Image: see text] Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows. American Chemical Society 2019-07-30 2019-08-26 /pmc/articles/PMC6727618/ /pubmed/31361484 http://dx.doi.org/10.1021/acs.jcim.9b00237 Text en Copyright © 2019 American Chemical Society This is an open access article published under a Creative Commons Non-Commercial No Derivative Works (CC-BY-NC-ND) Attribution License (http://pubs.acs.org/page/policy/authorchoice_ccbyncnd_termsofuse.html) , which permits copying and redistribution of the article, and creation of adaptations, all for non-commercial purposes.
spellingShingle Yang, Kevin
Swanson, Kyle
Jin, Wengong
Coley, Connor
Eiden, Philipp
Gao, Hua
Guzman-Perez, Angel
Hopper, Timothy
Kelley, Brian
Mathea, Miriam
Palmer, Andrew
Settels, Volker
Jaakkola, Tommi
Jensen, Klavs
Barzilay, Regina
Analyzing Learned Molecular Representations for Property Prediction
title Analyzing Learned Molecular Representations for Property Prediction
title_full Analyzing Learned Molecular Representations for Property Prediction
title_fullStr Analyzing Learned Molecular Representations for Property Prediction
title_full_unstemmed Analyzing Learned Molecular Representations for Property Prediction
title_short Analyzing Learned Molecular Representations for Property Prediction
title_sort analyzing learned molecular representations for property prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6727618/
https://www.ncbi.nlm.nih.gov/pubmed/31361484
http://dx.doi.org/10.1021/acs.jcim.9b00237
work_keys_str_mv AT yangkevin analyzinglearnedmolecularrepresentationsforpropertyprediction
AT swansonkyle analyzinglearnedmolecularrepresentationsforpropertyprediction
AT jinwengong analyzinglearnedmolecularrepresentationsforpropertyprediction
AT coleyconnor analyzinglearnedmolecularrepresentationsforpropertyprediction
AT eidenphilipp analyzinglearnedmolecularrepresentationsforpropertyprediction
AT gaohua analyzinglearnedmolecularrepresentationsforpropertyprediction
AT guzmanperezangel analyzinglearnedmolecularrepresentationsforpropertyprediction
AT hoppertimothy analyzinglearnedmolecularrepresentationsforpropertyprediction
AT kelleybrian analyzinglearnedmolecularrepresentationsforpropertyprediction
AT matheamiriam analyzinglearnedmolecularrepresentationsforpropertyprediction
AT palmerandrew analyzinglearnedmolecularrepresentationsforpropertyprediction
AT settelsvolker analyzinglearnedmolecularrepresentationsforpropertyprediction
AT jaakkolatommi analyzinglearnedmolecularrepresentationsforpropertyprediction
AT jensenklavs analyzinglearnedmolecularrepresentationsforpropertyprediction
AT barzilayregina analyzinglearnedmolecularrepresentationsforpropertyprediction