Cargando…

The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules

Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Justin S., Zubatyuk, Roman, Nebgen, Benjamin, Lubbers, Nicholas, Barros, Kipton, Roitberg, Adrian E., Isayev, Olexandr, Tretiak, Sergei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195467/
https://www.ncbi.nlm.nih.gov/pubmed/32358545
http://dx.doi.org/10.1038/s41597-020-0473-z
_version_ 1783528540830957568
author Smith, Justin S.
Zubatyuk, Roman
Nebgen, Benjamin
Lubbers, Nicholas
Barros, Kipton
Roitberg, Adrian E.
Isayev, Olexandr
Tretiak, Sergei
author_facet Smith, Justin S.
Zubatyuk, Roman
Nebgen, Benjamin
Lubbers, Nicholas
Barros, Kipton
Roitberg, Adrian E.
Isayev, Olexandr
Tretiak, Sergei
author_sort Smith, Justin S.
collection PubMed
description Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
format Online
Article
Text
id pubmed-7195467
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-71954672020-05-06 The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules Smith, Justin S. Zubatyuk, Roman Nebgen, Benjamin Lubbers, Nicholas Barros, Kipton Roitberg, Adrian E. Isayev, Olexandr Tretiak, Sergei Sci Data Data Descriptor Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry. Nature Publishing Group UK 2020-05-01 /pmc/articles/PMC7195467/ /pubmed/32358545 http://dx.doi.org/10.1038/s41597-020-0473-z Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Smith, Justin S.
Zubatyuk, Roman
Nebgen, Benjamin
Lubbers, Nicholas
Barros, Kipton
Roitberg, Adrian E.
Isayev, Olexandr
Tretiak, Sergei
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
title The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
title_full The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
title_fullStr The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
title_full_unstemmed The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
title_short The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
title_sort ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195467/
https://www.ncbi.nlm.nih.gov/pubmed/32358545
http://dx.doi.org/10.1038/s41597-020-0473-z
work_keys_str_mv AT smithjustins theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT zubatyukroman theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT nebgenbenjamin theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT lubbersnicholas theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT barroskipton theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT roitbergadriane theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT isayevolexandr theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT tretiaksergei theani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT smithjustins ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT zubatyukroman ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT nebgenbenjamin ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT lubbersnicholas ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT barroskipton ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT roitbergadriane ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT isayevolexandr ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules
AT tretiaksergei ani1ccxandani1xdatasetscoupledclusteranddensityfunctionaltheorypropertiesformolecules