Cargando…

A framework for automated structure elucidation from routine NMR spectra

Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we i...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Zhaorui, Chen, Michael S., Woroch, Cristian P., Markland, Thomas E., Kanan, Matthew W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635205/
https://www.ncbi.nlm.nih.gov/pubmed/34976353
http://dx.doi.org/10.1039/d1sc04105c
_version_ 1784608256292814848
author Huang, Zhaorui
Chen, Michael S.
Woroch, Cristian P.
Markland, Thomas E.
Kanan, Matthew W.
author_facet Huang, Zhaorui
Chen, Michael S.
Woroch, Cristian P.
Markland, Thomas E.
Kanan, Matthew W.
author_sort Huang, Zhaorui
collection PubMed
description Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional (1)H and/or (13)C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.
format Online
Article
Text
id pubmed-8635205
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-86352052021-12-30 A framework for automated structure elucidation from routine NMR spectra Huang, Zhaorui Chen, Michael S. Woroch, Cristian P. Markland, Thomas E. Kanan, Matthew W. Chem Sci Chemistry Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional (1)H and/or (13)C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. The Royal Society of Chemistry 2021-11-09 /pmc/articles/PMC8635205/ /pubmed/34976353 http://dx.doi.org/10.1039/d1sc04105c Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
Huang, Zhaorui
Chen, Michael S.
Woroch, Cristian P.
Markland, Thomas E.
Kanan, Matthew W.
A framework for automated structure elucidation from routine NMR spectra
title A framework for automated structure elucidation from routine NMR spectra
title_full A framework for automated structure elucidation from routine NMR spectra
title_fullStr A framework for automated structure elucidation from routine NMR spectra
title_full_unstemmed A framework for automated structure elucidation from routine NMR spectra
title_short A framework for automated structure elucidation from routine NMR spectra
title_sort framework for automated structure elucidation from routine nmr spectra
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635205/
https://www.ncbi.nlm.nih.gov/pubmed/34976353
http://dx.doi.org/10.1039/d1sc04105c
work_keys_str_mv AT huangzhaorui aframeworkforautomatedstructureelucidationfromroutinenmrspectra
AT chenmichaels aframeworkforautomatedstructureelucidationfromroutinenmrspectra
AT worochcristianp aframeworkforautomatedstructureelucidationfromroutinenmrspectra
AT marklandthomase aframeworkforautomatedstructureelucidationfromroutinenmrspectra
AT kananmattheww aframeworkforautomatedstructureelucidationfromroutinenmrspectra
AT huangzhaorui frameworkforautomatedstructureelucidationfromroutinenmrspectra
AT chenmichaels frameworkforautomatedstructureelucidationfromroutinenmrspectra
AT worochcristianp frameworkforautomatedstructureelucidationfromroutinenmrspectra
AT marklandthomase frameworkforautomatedstructureelucidationfromroutinenmrspectra
AT kananmattheww frameworkforautomatedstructureelucidationfromroutinenmrspectra