Cargando…
ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules
One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML) methods are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. The...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735918/ https://www.ncbi.nlm.nih.gov/pubmed/29257127 http://dx.doi.org/10.1038/sdata.2017.193 |
_version_ | 1783287293409230848 |
---|---|
author | Smith, Justin S. Isayev, Olexandr Roitberg, Adrian E. |
author_facet | Smith, Justin S. Isayev, Olexandr Roitberg, Adrian E. |
author_sort | Smith, Justin S. |
collection | PubMed |
description | One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML) methods are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials, such as neural networks, comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of more than 20 M off equilibrium conformations for 57,462 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community. |
format | Online Article Text |
id | pubmed-5735918 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-57359182017-12-21 ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules Smith, Justin S. Isayev, Olexandr Roitberg, Adrian E. Sci Data Data Descriptor One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML) methods are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials, such as neural networks, comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of more than 20 M off equilibrium conformations for 57,462 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community. Nature Publishing Group 2017-12-19 /pmc/articles/PMC5735918/ /pubmed/29257127 http://dx.doi.org/10.1038/sdata.2017.193 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article. |
spellingShingle | Data Descriptor Smith, Justin S. Isayev, Olexandr Roitberg, Adrian E. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules |
title | ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules |
title_full | ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules |
title_fullStr | ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules |
title_full_unstemmed | ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules |
title_short | ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules |
title_sort | ani-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735918/ https://www.ncbi.nlm.nih.gov/pubmed/29257127 http://dx.doi.org/10.1038/sdata.2017.193 |
work_keys_str_mv | AT smithjustins ani1adatasetof20millioncalculatedoffequilibriumconformationsfororganicmolecules AT isayevolexandr ani1adatasetof20millioncalculatedoffequilibriumconformationsfororganicmolecules AT roitbergadriane ani1adatasetof20millioncalculatedoffequilibriumconformationsfororganicmolecules |