Cargando…

In Search of Disentanglement in Tandem Mass Spectrometry Datasets

Generative modeling and representation learning of tandem mass spectrometry data aim to learn an interpretable and instrument-agnostic digital representation of metabolites directly from MS/MS spectra. Interpretable and instrument-agnostic digital representations would facilitate comparisons of MS/M...

Descripción completa

Detalles Bibliográficos
Autores principales: Abram, Krzysztof Jan, McCloskey, Douglas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10526774/
https://www.ncbi.nlm.nih.gov/pubmed/37759743
http://dx.doi.org/10.3390/biom13091343
_version_ 1785111062892249088
author Abram, Krzysztof Jan
McCloskey, Douglas
author_facet Abram, Krzysztof Jan
McCloskey, Douglas
author_sort Abram, Krzysztof Jan
collection PubMed
description Generative modeling and representation learning of tandem mass spectrometry data aim to learn an interpretable and instrument-agnostic digital representation of metabolites directly from MS/MS spectra. Interpretable and instrument-agnostic digital representations would facilitate comparisons of MS/MS spectra between instrument vendors and enable better and more accurate queries of large MS/MS spectra databases for metabolite identification. In this study, we apply generative modeling and representation learning using variational autoencoders to understand the extent to which tandem mass spectra can be disentangled into their factors of generation (e.g., collision energy, ionization mode, instrument type, etc.) with minimal prior knowledge of the factors. We find that variational autoencoders can disentangle tandem mass spectra data with the proper choice of hyperparameters into meaningful latent representations aligned with known factors of variation. We develop a two-step approach to facilitate the selection of models that are disentangled, which could be applied to other complex and high-dimensional data sets.
format Online
Article
Text
id pubmed-10526774
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105267742023-09-28 In Search of Disentanglement in Tandem Mass Spectrometry Datasets Abram, Krzysztof Jan McCloskey, Douglas Biomolecules Article Generative modeling and representation learning of tandem mass spectrometry data aim to learn an interpretable and instrument-agnostic digital representation of metabolites directly from MS/MS spectra. Interpretable and instrument-agnostic digital representations would facilitate comparisons of MS/MS spectra between instrument vendors and enable better and more accurate queries of large MS/MS spectra databases for metabolite identification. In this study, we apply generative modeling and representation learning using variational autoencoders to understand the extent to which tandem mass spectra can be disentangled into their factors of generation (e.g., collision energy, ionization mode, instrument type, etc.) with minimal prior knowledge of the factors. We find that variational autoencoders can disentangle tandem mass spectra data with the proper choice of hyperparameters into meaningful latent representations aligned with known factors of variation. We develop a two-step approach to facilitate the selection of models that are disentangled, which could be applied to other complex and high-dimensional data sets. MDPI 2023-09-04 /pmc/articles/PMC10526774/ /pubmed/37759743 http://dx.doi.org/10.3390/biom13091343 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Abram, Krzysztof Jan
McCloskey, Douglas
In Search of Disentanglement in Tandem Mass Spectrometry Datasets
title In Search of Disentanglement in Tandem Mass Spectrometry Datasets
title_full In Search of Disentanglement in Tandem Mass Spectrometry Datasets
title_fullStr In Search of Disentanglement in Tandem Mass Spectrometry Datasets
title_full_unstemmed In Search of Disentanglement in Tandem Mass Spectrometry Datasets
title_short In Search of Disentanglement in Tandem Mass Spectrometry Datasets
title_sort in search of disentanglement in tandem mass spectrometry datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10526774/
https://www.ncbi.nlm.nih.gov/pubmed/37759743
http://dx.doi.org/10.3390/biom13091343
work_keys_str_mv AT abramkrzysztofjan insearchofdisentanglementintandemmassspectrometrydatasets
AT mccloskeydouglas insearchofdisentanglementintandemmassspectrometrydatasets