Cargando…

MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra

Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cann...

Descripción completa

Detalles Bibliográficos
Autores principales: Huber, Florian, van der Burg, Sven, van der Hooft, Justin J. J., Ridder, Lars
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556919/
https://www.ncbi.nlm.nih.gov/pubmed/34715914
http://dx.doi.org/10.1186/s13321-021-00558-4
_version_ 1784592269978894336
author Huber, Florian
van der Burg, Sven
van der Hooft, Justin J. J.
Ridder, Lars
author_facet Huber, Florian
van der Burg, Sven
van der Hooft, Justin J. J.
Ridder, Lars
author_sort Huber, Florian
collection PubMed
description Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00558-4.
format Online
Article
Text
id pubmed-8556919
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-85569192021-11-01 MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra Huber, Florian van der Burg, Sven van der Hooft, Justin J. J. Ridder, Lars J Cheminform Methodology Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00558-4. Springer International Publishing 2021-10-29 /pmc/articles/PMC8556919/ /pubmed/34715914 http://dx.doi.org/10.1186/s13321-021-00558-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Huber, Florian
van der Burg, Sven
van der Hooft, Justin J. J.
Ridder, Lars
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
title MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
title_full MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
title_fullStr MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
title_full_unstemmed MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
title_short MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
title_sort ms2deepscore: a novel deep learning similarity measure to compare tandem mass spectra
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556919/
https://www.ncbi.nlm.nih.gov/pubmed/34715914
http://dx.doi.org/10.1186/s13321-021-00558-4
work_keys_str_mv AT huberflorian ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra
AT vanderburgsven ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra
AT vanderhooftjustinjj ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra
AT ridderlars ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra