Cargando…

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships

Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities hav...

Descripción completa

Detalles Bibliográficos
Autores principales: Huber, Florian, Ridder, Lars, Verhoeven, Stefan, Spaaks, Jurriaan H., Diblen, Faruk, Rogers, Simon, van der Hooft, Justin J. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7909622/
https://www.ncbi.nlm.nih.gov/pubmed/33591968
http://dx.doi.org/10.1371/journal.pcbi.1008724
_version_ 1783655966092296192
author Huber, Florian
Ridder, Lars
Verhoeven, Stefan
Spaaks, Jurriaan H.
Diblen, Faruk
Rogers, Simon
van der Hooft, Justin J. J.
author_facet Huber, Florian
Ridder, Lars
Verhoeven, Stefan
Spaaks, Jurriaan H.
Diblen, Faruk
Rogers, Simon
van der Hooft, Justin J. J.
author_sort Huber, Florian
collection PubMed
description Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.
format Online
Article
Text
id pubmed-7909622
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79096222021-03-05 Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships Huber, Florian Ridder, Lars Verhoeven, Stefan Spaaks, Jurriaan H. Diblen, Faruk Rogers, Simon van der Hooft, Justin J. J. PLoS Comput Biol Research Article Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds. Public Library of Science 2021-02-16 /pmc/articles/PMC7909622/ /pubmed/33591968 http://dx.doi.org/10.1371/journal.pcbi.1008724 Text en © 2021 Huber et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Huber, Florian
Ridder, Lars
Verhoeven, Stefan
Spaaks, Jurriaan H.
Diblen, Faruk
Rogers, Simon
van der Hooft, Justin J. J.
Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
title Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
title_full Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
title_fullStr Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
title_full_unstemmed Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
title_short Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
title_sort spec2vec: improved mass spectral similarity scoring through learning of structural relationships
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7909622/
https://www.ncbi.nlm.nih.gov/pubmed/33591968
http://dx.doi.org/10.1371/journal.pcbi.1008724
work_keys_str_mv AT huberflorian spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships
AT ridderlars spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships
AT verhoevenstefan spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships
AT spaaksjurriaanh spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships
AT diblenfaruk spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships
AT rogerssimon spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships
AT vanderhooftjustinjj spec2vecimprovedmassspectralsimilarityscoringthroughlearningofstructuralrelationships