Cargando…
MoleculeNet: a benchmark for molecular machine learning
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Royal Society of Chemistry
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307/ https://www.ncbi.nlm.nih.gov/pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a |
_version_ | 1783309128738799616 |
---|---|
author | Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay |
author_facet | Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay |
author_sort | Wu, Zhenqin |
collection | PubMed |
description | Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. |
format | Online Article Text |
id | pubmed-5868307 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-58683072018-04-06 MoleculeNet: a benchmark for molecular machine learning Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay Chem Sci Chemistry Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. Royal Society of Chemistry 2017-10-31 /pmc/articles/PMC5868307/ /pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a Text en This journal is © The Royal Society of Chemistry 2018 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0) |
spellingShingle | Chemistry Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N. Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S. Leswing, Karl Pande, Vijay MoleculeNet: a benchmark for molecular machine learning |
title | MoleculeNet: a benchmark for molecular machine learning
|
title_full | MoleculeNet: a benchmark for molecular machine learning
|
title_fullStr | MoleculeNet: a benchmark for molecular machine learning
|
title_full_unstemmed | MoleculeNet: a benchmark for molecular machine learning
|
title_short | MoleculeNet: a benchmark for molecular machine learning
|
title_sort | moleculenet: a benchmark for molecular machine learning |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307/ https://www.ncbi.nlm.nih.gov/pubmed/29629118 http://dx.doi.org/10.1039/c7sc02664a |
work_keys_str_mv | AT wuzhenqin moleculenetabenchmarkformolecularmachinelearning AT ramsundarbharath moleculenetabenchmarkformolecularmachinelearning AT feinbergevann moleculenetabenchmarkformolecularmachinelearning AT gomesjoseph moleculenetabenchmarkformolecularmachinelearning AT geniessecaleb moleculenetabenchmarkformolecularmachinelearning AT pappuaneeshs moleculenetabenchmarkformolecularmachinelearning AT leswingkarl moleculenetabenchmarkformolecularmachinelearning AT pandevijay moleculenetabenchmarkformolecularmachinelearning |