Cargando…

Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol

A corrected group contribution (CGC)–molecule contribution (MC)–Bayesian neural network (BNN) protocol for accurate prediction of absorption spectra is presented. Upon combination of BNN with CGC methods, the full absorption spectra of various molecules are afforded accurately and efficiently—by usi...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Jinming, Qian, Chao, Zhou, Shaodong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AAAS 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243197/
https://www.ncbi.nlm.nih.gov/pubmed/37287889
http://dx.doi.org/10.34133/research.0115
_version_ 1785054378080600064
author Fan, Jinming
Qian, Chao
Zhou, Shaodong
author_facet Fan, Jinming
Qian, Chao
Zhou, Shaodong
author_sort Fan, Jinming
collection PubMed
description A corrected group contribution (CGC)–molecule contribution (MC)–Bayesian neural network (BNN) protocol for accurate prediction of absorption spectra is presented. Upon combination of BNN with CGC methods, the full absorption spectra of various molecules are afforded accurately and efficiently—by using only a small dataset for training. Here, with a small training sample (<100), accurate prediction of maximum wavelength for single molecules is afforded with the first stage of the protocol; by contrast, previously reported machine learning (ML) methods require >1,000 samples to ensure the accuracy of prediction. Furthermore, with <500 samples, the mean square error in the prediction of full ultraviolet spectra reaches <2%; for comparison, ML models with molecular SMILES for training require a much larger dataset (>2,000) to achieve comparable accuracy. Moreover, by employing an MC method designed specifically for CGC that properly interprets the mixing rule, the spectra of mixtures are obtained with high accuracy. The logical origins of the good performance of the protocol are discussed in detail. Considering that such a constituent contribution protocol combines chemical principles and data-driven tools, most likely, it will be proven efficient to solve molecular-property-relevant problems in wider fields.
format Online
Article
Text
id pubmed-10243197
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher AAAS
record_format MEDLINE/PubMed
spelling pubmed-102431972023-06-07 Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol Fan, Jinming Qian, Chao Zhou, Shaodong Research (Wash D C) Research Article A corrected group contribution (CGC)–molecule contribution (MC)–Bayesian neural network (BNN) protocol for accurate prediction of absorption spectra is presented. Upon combination of BNN with CGC methods, the full absorption spectra of various molecules are afforded accurately and efficiently—by using only a small dataset for training. Here, with a small training sample (<100), accurate prediction of maximum wavelength for single molecules is afforded with the first stage of the protocol; by contrast, previously reported machine learning (ML) methods require >1,000 samples to ensure the accuracy of prediction. Furthermore, with <500 samples, the mean square error in the prediction of full ultraviolet spectra reaches <2%; for comparison, ML models with molecular SMILES for training require a much larger dataset (>2,000) to achieve comparable accuracy. Moreover, by employing an MC method designed specifically for CGC that properly interprets the mixing rule, the spectra of mixtures are obtained with high accuracy. The logical origins of the good performance of the protocol are discussed in detail. Considering that such a constituent contribution protocol combines chemical principles and data-driven tools, most likely, it will be proven efficient to solve molecular-property-relevant problems in wider fields. AAAS 2023-04-20 /pmc/articles/PMC10243197/ /pubmed/37287889 http://dx.doi.org/10.34133/research.0115 Text en Copyright © 2023 Jinming Fan et al. https://creativecommons.org/licenses/by/4.0/Exclusive Licensee Science and Technology Review Publishing House. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Fan, Jinming
Qian, Chao
Zhou, Shaodong
Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol
title Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol
title_full Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol
title_fullStr Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol
title_full_unstemmed Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol
title_short Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol
title_sort machine learning spectroscopy using a 2-stage, generalized constituent contribution protocol
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243197/
https://www.ncbi.nlm.nih.gov/pubmed/37287889
http://dx.doi.org/10.34133/research.0115
work_keys_str_mv AT fanjinming machinelearningspectroscopyusinga2stagegeneralizedconstituentcontributionprotocol
AT qianchao machinelearningspectroscopyusinga2stagegeneralizedconstituentcontributionprotocol
AT zhoushaodong machinelearningspectroscopyusinga2stagegeneralizedconstituentcontributionprotocol