Cargando…

MoleculeNet: a benchmark for molecular machine learning

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wu, Zhenqin, Ramsundar, Bharath, Feinberg, Evan N., Gomes, Joseph, Geniesse, Caleb, Pappu, Aneesh S., Leswing, Karl, Pande, Vijay
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Royal Society of Chemistry 2017
Materias:	Chemistry
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307/ https://www.ncbi.nlm.nih.gov/pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a

_version_	1783309128738799616
author	Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay
author_facet	Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay
author_sort	Wu, Zhenqin
collection	PubMed
description	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
format	Online Article Text
id	pubmed-5868307
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Royal Society of Chemistry
record_format	MEDLINE/PubMed
spelling	pubmed-58683072018-04-06 MoleculeNet: a benchmark for molecular machine learning Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay Chem Sci Chemistry Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. Royal Society of Chemistry 2017-10-31 /pmc/articles/PMC5868307/ /pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a Text en This journal is © The Royal Society of Chemistry 2018 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0)
spellingShingle	Chemistry Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay MoleculeNet: a benchmark for molecular machine learning
title	MoleculeNet: a benchmark for molecular machine learning
title_full	MoleculeNet: a benchmark for molecular machine learning
title_fullStr	MoleculeNet: a benchmark for molecular machine learning
title_full_unstemmed	MoleculeNet: a benchmark for molecular machine learning
title_short	MoleculeNet: a benchmark for molecular machine learning
title_sort	moleculenet: a benchmark for molecular machine learning
topic	Chemistry
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307/ https://www.ncbi.nlm.nih.gov/pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a
work_keys_str_mv	AT wuzhenqin moleculenetabenchmarkformolecularmachinelearning AT ramsundarbharath moleculenetabenchmarkformolecularmachinelearning AT feinbergevann moleculenetabenchmarkformolecularmachinelearning AT gomesjoseph moleculenetabenchmarkformolecularmachinelearning AT geniessecaleb moleculenetabenchmarkformolecularmachinelearning AT pappuaneeshs moleculenetabenchmarkformolecularmachinelearning AT leswingkarl moleculenetabenchmarkformolecularmachinelearning AT pandevijay moleculenetabenchmarkformolecularmachinelearning

MoleculeNet: a benchmark for molecular machine learning

Ejemplares similares