Cargando…

An end-to-end deep learning framework for translating mass spectra to de-novo molecules

Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequent...

Descripción completa

Detalles Bibliográficos
Autores principales: Litsa, Eleni E., Chenthamarakshan, Vijil, Das, Payel, Kavraki, Lydia E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290119/
https://www.ncbi.nlm.nih.gov/pubmed/37353554
http://dx.doi.org/10.1038/s42004-023-00932-3
_version_ 1785062423718264832
author Litsa, Eleni E.
Chenthamarakshan, Vijil
Das, Payel
Kavraki, Lydia E.
author_facet Litsa, Eleni E.
Chenthamarakshan, Vijil
Das, Payel
Kavraki, Lydia E.
author_sort Litsa, Eleni E.
collection PubMed
description Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, these methods fail for novel molecules that are not present in the reference database. We propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key molecular substructures from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree methods particularly when test structure information is not available during training or present in the reference database.
format Online
Article
Text
id pubmed-10290119
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-102901192023-06-25 An end-to-end deep learning framework for translating mass spectra to de-novo molecules Litsa, Eleni E. Chenthamarakshan, Vijil Das, Payel Kavraki, Lydia E. Commun Chem Article Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, these methods fail for novel molecules that are not present in the reference database. We propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key molecular substructures from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree methods particularly when test structure information is not available during training or present in the reference database. Nature Publishing Group UK 2023-06-23 /pmc/articles/PMC10290119/ /pubmed/37353554 http://dx.doi.org/10.1038/s42004-023-00932-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Litsa, Eleni E.
Chenthamarakshan, Vijil
Das, Payel
Kavraki, Lydia E.
An end-to-end deep learning framework for translating mass spectra to de-novo molecules
title An end-to-end deep learning framework for translating mass spectra to de-novo molecules
title_full An end-to-end deep learning framework for translating mass spectra to de-novo molecules
title_fullStr An end-to-end deep learning framework for translating mass spectra to de-novo molecules
title_full_unstemmed An end-to-end deep learning framework for translating mass spectra to de-novo molecules
title_short An end-to-end deep learning framework for translating mass spectra to de-novo molecules
title_sort end-to-end deep learning framework for translating mass spectra to de-novo molecules
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290119/
https://www.ncbi.nlm.nih.gov/pubmed/37353554
http://dx.doi.org/10.1038/s42004-023-00932-3
work_keys_str_mv AT litsaelenie anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT chenthamarakshanvijil anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT daspayel anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT kavrakilydiae anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT litsaelenie endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT chenthamarakshanvijil endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT daspayel endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules
AT kavrakilydiae endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules