Cargando…
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195467/ https://www.ncbi.nlm.nih.gov/pubmed/32358545 http://dx.doi.org/10.1038/s41597-020-0473-z |
_version_ | 1783528540830957568 |
---|---|
author | Smith, Justin S. Zubatyuk, Roman Nebgen, Benjamin Lubbers, Nicholas Barros, Kipton Roitberg, Adrian E. Isayev, Olexandr Tretiak, Sergei |
author_facet | Smith, Justin S. Zubatyuk, Roman Nebgen, Benjamin Lubbers, Nicholas Barros, Kipton Roitberg, Adrian E. Isayev, Olexandr Tretiak, Sergei |
author_sort | Smith, Justin S. |
collection | PubMed |
description | Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry. |
format | Online Article Text |
id | pubmed-7195467 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-71954672020-05-06 The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules Smith, Justin S. Zubatyuk, Roman Nebgen, Benjamin Lubbers, Nicholas Barros, Kipton Roitberg, Adrian E. Isayev, Olexandr Tretiak, Sergei Sci Data Data Descriptor Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry. Nature Publishing Group UK 2020-05-01 /pmc/articles/PMC7195467/ /pubmed/32358545 http://dx.doi.org/10.1038/s41597-020-0473-z Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. |
spellingShingle | Data Descriptor Smith, Justin S. Zubatyuk, Roman Nebgen, Benjamin Lubbers, Nicholas Barros, Kipton Roitberg, Adrian E. Isayev, Olexandr Tretiak, Sergei The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules |
title | The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules |
title_full | The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules |
title_fullStr | The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules |
title_full_unstemmed | The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules |
title_short | The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules |
title_sort | ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195467/ https://www.ncbi.nlm.nih.gov/pubmed/32358545 http://dx.doi.org/10.1038/s41597-020-0473-z |
work_keys_str_mv | AT smithjustins theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT zubatyukroman theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT nebgenbenjamin theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT lubbersnicholas theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT barroskipton theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT roitbergadriane theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT isayevolexandr theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT tretiaksergei theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT smithjustins ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT zubatyukroman ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT nebgenbenjamin ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT lubbersnicholas ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT barroskipton ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT roitbergadriane ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT isayevolexandr ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules AT tretiaksergei ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules |