Cargando…

Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications

[Image: see text] Formation and growth of atmospheric molecular clusters into aerosol particles impact the global climate and contribute to the high uncertainty in modern climate models. Cluster formation is usually studied using quantum chemical methods, which quickly becomes computationally expens...

Descripción completa

Detalles Bibliográficos
Autores principales: Knattrup, Yosef, Kubečka, Jakub, Ayoubi, Daniel, Elm, Jonas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10357536/
https://www.ncbi.nlm.nih.gov/pubmed/37483242
http://dx.doi.org/10.1021/acsomega.3c02203
_version_ 1785075511797481472
author Knattrup, Yosef
Kubečka, Jakub
Ayoubi, Daniel
Elm, Jonas
author_facet Knattrup, Yosef
Kubečka, Jakub
Ayoubi, Daniel
Elm, Jonas
author_sort Knattrup, Yosef
collection PubMed
description [Image: see text] Formation and growth of atmospheric molecular clusters into aerosol particles impact the global climate and contribute to the high uncertainty in modern climate models. Cluster formation is usually studied using quantum chemical methods, which quickly becomes computationally expensive when system sizes grow. In this work, we present a large database of ∼250k atmospheric relevant cluster structures, which can be applied for developing machine learning (ML) models. The database is used to train the ML model kernel ridge regression (KRR) with the FCHL19 representation. We test the ability of the model to extrapolate from smaller clusters to larger clusters, between different molecules, between equilibrium structures and out-of-equilibrium structures, and the transferability onto systems with new interactions. We show that KRR models can extrapolate to larger sizes and transfer acid and base interactions with mean absolute errors below 1 kcal/mol. We suggest introducing an iterative ML step in configurational sampling processes, which can reduce the computational expense. Such an approach would allow us to study significantly more cluster systems at higher accuracy than previously possible and thereby allow us to cover a much larger part of relevant atmospheric compounds.
format Online
Article
Text
id pubmed-10357536
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-103575362023-07-21 Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications Knattrup, Yosef Kubečka, Jakub Ayoubi, Daniel Elm, Jonas ACS Omega [Image: see text] Formation and growth of atmospheric molecular clusters into aerosol particles impact the global climate and contribute to the high uncertainty in modern climate models. Cluster formation is usually studied using quantum chemical methods, which quickly becomes computationally expensive when system sizes grow. In this work, we present a large database of ∼250k atmospheric relevant cluster structures, which can be applied for developing machine learning (ML) models. The database is used to train the ML model kernel ridge regression (KRR) with the FCHL19 representation. We test the ability of the model to extrapolate from smaller clusters to larger clusters, between different molecules, between equilibrium structures and out-of-equilibrium structures, and the transferability onto systems with new interactions. We show that KRR models can extrapolate to larger sizes and transfer acid and base interactions with mean absolute errors below 1 kcal/mol. We suggest introducing an iterative ML step in configurational sampling processes, which can reduce the computational expense. Such an approach would allow us to study significantly more cluster systems at higher accuracy than previously possible and thereby allow us to cover a much larger part of relevant atmospheric compounds. American Chemical Society 2023-06-30 /pmc/articles/PMC10357536/ /pubmed/37483242 http://dx.doi.org/10.1021/acsomega.3c02203 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Knattrup, Yosef
Kubečka, Jakub
Ayoubi, Daniel
Elm, Jonas
Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications
title Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications
title_full Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications
title_fullStr Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications
title_full_unstemmed Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications
title_short Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications
title_sort clusterome: a comprehensive data set of atmospheric molecular clusters for machine learning applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10357536/
https://www.ncbi.nlm.nih.gov/pubmed/37483242
http://dx.doi.org/10.1021/acsomega.3c02203
work_keys_str_mv AT knattrupyosef clusteromeacomprehensivedatasetofatmosphericmolecularclustersformachinelearningapplications
AT kubeckajakub clusteromeacomprehensivedatasetofatmosphericmolecularclustersformachinelearningapplications
AT ayoubidaniel clusteromeacomprehensivedatasetofatmosphericmolecularclustersformachinelearningapplications
AT elmjonas clusteromeacomprehensivedatasetofatmosphericmolecularclustersformachinelearningapplications