Cargando…
An end-to-end deep learning framework for translating mass spectra to de-novo molecules
Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequent...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290119/ https://www.ncbi.nlm.nih.gov/pubmed/37353554 http://dx.doi.org/10.1038/s42004-023-00932-3 |
_version_ | 1785062423718264832 |
---|---|
author | Litsa, Eleni E. Chenthamarakshan, Vijil Das, Payel Kavraki, Lydia E. |
author_facet | Litsa, Eleni E. Chenthamarakshan, Vijil Das, Payel Kavraki, Lydia E. |
author_sort | Litsa, Eleni E. |
collection | PubMed |
description | Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, these methods fail for novel molecules that are not present in the reference database. We propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key molecular substructures from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree methods particularly when test structure information is not available during training or present in the reference database. |
format | Online Article Text |
id | pubmed-10290119 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-102901192023-06-25 An end-to-end deep learning framework for translating mass spectra to de-novo molecules Litsa, Eleni E. Chenthamarakshan, Vijil Das, Payel Kavraki, Lydia E. Commun Chem Article Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, these methods fail for novel molecules that are not present in the reference database. We propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key molecular substructures from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree methods particularly when test structure information is not available during training or present in the reference database. Nature Publishing Group UK 2023-06-23 /pmc/articles/PMC10290119/ /pubmed/37353554 http://dx.doi.org/10.1038/s42004-023-00932-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Litsa, Eleni E. Chenthamarakshan, Vijil Das, Payel Kavraki, Lydia E. An end-to-end deep learning framework for translating mass spectra to de-novo molecules |
title | An end-to-end deep learning framework for translating mass spectra to de-novo molecules |
title_full | An end-to-end deep learning framework for translating mass spectra to de-novo molecules |
title_fullStr | An end-to-end deep learning framework for translating mass spectra to de-novo molecules |
title_full_unstemmed | An end-to-end deep learning framework for translating mass spectra to de-novo molecules |
title_short | An end-to-end deep learning framework for translating mass spectra to de-novo molecules |
title_sort | end-to-end deep learning framework for translating mass spectra to de-novo molecules |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290119/ https://www.ncbi.nlm.nih.gov/pubmed/37353554 http://dx.doi.org/10.1038/s42004-023-00932-3 |
work_keys_str_mv | AT litsaelenie anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT chenthamarakshanvijil anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT daspayel anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT kavrakilydiae anendtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT litsaelenie endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT chenthamarakshanvijil endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT daspayel endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules AT kavrakilydiae endtoenddeeplearningframeworkfortranslatingmassspectratodenovomolecules |