Cargando…

Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry

[Image: see text] Mass spectrometry is a vital tool in the analytical chemist’s toolkit, commonly used to identify the presence of known compounds and elucidate unknown chemical structures. All of these applications rely on having previously measured spectra for known substances. Computational metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Richard Licheng, Jonas, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9909676/
https://www.ncbi.nlm.nih.gov/pubmed/36695638
http://dx.doi.org/10.1021/acs.analchem.2c02093
_version_ 1784884624927752192
author Zhu, Richard Licheng
Jonas, Eric
author_facet Zhu, Richard Licheng
Jonas, Eric
author_sort Zhu, Richard Licheng
collection PubMed
description [Image: see text] Mass spectrometry is a vital tool in the analytical chemist’s toolkit, commonly used to identify the presence of known compounds and elucidate unknown chemical structures. All of these applications rely on having previously measured spectra for known substances. Computational methods for predicting mass spectra from chemical structures can be used to augment existing spectral databases with predicted spectra from previously unmeasured molecules. In this paper, we present a method for prediction of electron ionization–mass spectra (EI–MS) of small molecules that combines physically plausible substructure enumeration and deep learning, which we term rapid approximate subset-based spectra prediction (RASSP). The first of our two models, FormulaNet, produces a probability distribution over chemical subformulae to achieve a state-of-the-art forward prediction accuracy of 92.9% weighted (Stein) dot product and database lookup recall (within top 10 ranked spectra) of 98.0% when evaluated against the NIST 2017 Mass Spectral Library. The second model, SubsetNet, produces a probability distribution over vertex subsets of the original molecule graph to achieve similar forward prediction accuracy and superior generalization in the high-resolution, low-data regime. Spectra predicted by our best model improve upon the previous state-of-the-art spectral database lookup error rate by a factor of 2.9×, reducing the lookup error (top 10) from 5.7 to 2.0%. Both models can train on and predict spectral data at arbitrary resolution. Source code and predicted EI–MS spectra for 73.2M small molecules from PubChem will be made freely accessible online.
format Online
Article
Text
id pubmed-9909676
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-99096762023-02-10 Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry Zhu, Richard Licheng Jonas, Eric Anal Chem [Image: see text] Mass spectrometry is a vital tool in the analytical chemist’s toolkit, commonly used to identify the presence of known compounds and elucidate unknown chemical structures. All of these applications rely on having previously measured spectra for known substances. Computational methods for predicting mass spectra from chemical structures can be used to augment existing spectral databases with predicted spectra from previously unmeasured molecules. In this paper, we present a method for prediction of electron ionization–mass spectra (EI–MS) of small molecules that combines physically plausible substructure enumeration and deep learning, which we term rapid approximate subset-based spectra prediction (RASSP). The first of our two models, FormulaNet, produces a probability distribution over chemical subformulae to achieve a state-of-the-art forward prediction accuracy of 92.9% weighted (Stein) dot product and database lookup recall (within top 10 ranked spectra) of 98.0% when evaluated against the NIST 2017 Mass Spectral Library. The second model, SubsetNet, produces a probability distribution over vertex subsets of the original molecule graph to achieve similar forward prediction accuracy and superior generalization in the high-resolution, low-data regime. Spectra predicted by our best model improve upon the previous state-of-the-art spectral database lookup error rate by a factor of 2.9×, reducing the lookup error (top 10) from 5.7 to 2.0%. Both models can train on and predict spectral data at arbitrary resolution. Source code and predicted EI–MS spectra for 73.2M small molecules from PubChem will be made freely accessible online. American Chemical Society 2023-01-25 /pmc/articles/PMC9909676/ /pubmed/36695638 http://dx.doi.org/10.1021/acs.analchem.2c02093 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Zhu, Richard Licheng
Jonas, Eric
Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
title Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
title_full Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
title_fullStr Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
title_full_unstemmed Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
title_short Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
title_sort rapid approximate subset-based spectra prediction for electron ionization–mass spectrometry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9909676/
https://www.ncbi.nlm.nih.gov/pubmed/36695638
http://dx.doi.org/10.1021/acs.analchem.2c02093
work_keys_str_mv AT zhurichardlicheng rapidapproximatesubsetbasedspectrapredictionforelectronionizationmassspectrometry
AT jonaseric rapidapproximatesubsetbasedspectrapredictionforelectronionizationmassspectrometry