Cargando…

MoleculeNet: a benchmark for molecular machine learning

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Zhenqin, Ramsundar, Bharath, Feinberg, Evan N., Gomes, Joseph, Geniesse, Caleb, Pappu, Aneesh S., Leswing, Karl, Pande, Vijay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Royal Society of Chemistry 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307/
https://www.ncbi.nlm.nih.gov/pubmed/29629118
http://dx.doi.org/10.1039/c7sc02664a
_version_ 1783309128738799616
author Wu, Zhenqin
Ramsundar, Bharath
Feinberg, Evan N.
Gomes, Joseph
Geniesse, Caleb
Pappu, Aneesh S.
Leswing, Karl
Pande, Vijay
author_facet Wu, Zhenqin
Ramsundar, Bharath
Feinberg, Evan N.
Gomes, Joseph
Geniesse, Caleb
Pappu, Aneesh S.
Leswing, Karl
Pande, Vijay
author_sort Wu, Zhenqin
collection PubMed
description Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
format Online
Article
Text
id pubmed-5868307
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-58683072018-04-06 MoleculeNet: a benchmark for molecular machine learning Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay Chem Sci Chemistry Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. Royal Society of Chemistry 2017-10-31 /pmc/articles/PMC5868307/ /pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a Text en This journal is © The Royal Society of Chemistry 2018 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0)
spellingShingle Chemistry
Wu, Zhenqin
Ramsundar, Bharath
Feinberg, Evan N.
Gomes, Joseph
Geniesse, Caleb
Pappu, Aneesh S.
Leswing, Karl
Pande, Vijay
MoleculeNet: a benchmark for molecular machine learning
title MoleculeNet: a benchmark for molecular machine learning
title_full MoleculeNet: a benchmark for molecular machine learning
title_fullStr MoleculeNet: a benchmark for molecular machine learning
title_full_unstemmed MoleculeNet: a benchmark for molecular machine learning
title_short MoleculeNet: a benchmark for molecular machine learning
title_sort moleculenet: a benchmark for molecular machine learning
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307/
https://www.ncbi.nlm.nih.gov/pubmed/29629118
http://dx.doi.org/10.1039/c7sc02664a
work_keys_str_mv AT wuzhenqin moleculenetabenchmarkformolecularmachinelearning
AT ramsundarbharath moleculenetabenchmarkformolecularmachinelearning
AT feinbergevann moleculenetabenchmarkformolecularmachinelearning
AT gomesjoseph moleculenetabenchmarkformolecularmachinelearning
AT geniessecaleb moleculenetabenchmarkformolecularmachinelearning
AT pappuaneeshs moleculenetabenchmarkformolecularmachinelearning
AT leswingkarl moleculenetabenchmarkformolecularmachinelearning
AT pandevijay moleculenetabenchmarkformolecularmachinelearning