Cargando…

Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra

MOTIVATION: Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to pred...

Descripción completa

Detalles Bibliográficos
Autor principal: Dührkop, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235503/
https://www.ncbi.nlm.nih.gov/pubmed/35758813
http://dx.doi.org/10.1093/bioinformatics/btac260
_version_ 1784736326150520832
author Dührkop, Kai
author_facet Dührkop, Kai
author_sort Dührkop, Kai
collection PubMed
description MOTIVATION: Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to predict a molecular fingerprint from tandem mass spectra, then use the predicted fingerprint to search in a molecular structure database. Predicted molecular fingerprints are also of great interest for compound class annotation, de novo structure elucidation, and other tasks. So far, kernel support vector machines are the best tool for fingerprint prediction. However, they cannot be trained on all publicly available reference spectra because their training time scales cubically with the number of training data. RESULTS: We use the Nyström approximation to transform the kernel into a linear feature map. We evaluate two methods that use this feature map as input: a linear support vector machine and a deep neural network (DNN). For evaluation, we use a cross-validated dataset of 156 017 compounds and three independent datasets with 1734 compounds. We show that the combination of kernel method and DNN outperforms the kernel support vector machine, which is the current gold standard, as well as a DNN on tandem mass spectra on all evaluation datasets. AVAILABILITY AND IMPLEMENTATION: The deep kernel learning method for fingerprint prediction is part of the SIRIUS software, available at https://bio.informatik.uni-jena.de/software/sirius.
format Online
Article
Text
id pubmed-9235503
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92355032022-06-29 Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra Dührkop, Kai Bioinformatics ISCB/Ismb 2022 MOTIVATION: Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to predict a molecular fingerprint from tandem mass spectra, then use the predicted fingerprint to search in a molecular structure database. Predicted molecular fingerprints are also of great interest for compound class annotation, de novo structure elucidation, and other tasks. So far, kernel support vector machines are the best tool for fingerprint prediction. However, they cannot be trained on all publicly available reference spectra because their training time scales cubically with the number of training data. RESULTS: We use the Nyström approximation to transform the kernel into a linear feature map. We evaluate two methods that use this feature map as input: a linear support vector machine and a deep neural network (DNN). For evaluation, we use a cross-validated dataset of 156 017 compounds and three independent datasets with 1734 compounds. We show that the combination of kernel method and DNN outperforms the kernel support vector machine, which is the current gold standard, as well as a DNN on tandem mass spectra on all evaluation datasets. AVAILABILITY AND IMPLEMENTATION: The deep kernel learning method for fingerprint prediction is part of the SIRIUS software, available at https://bio.informatik.uni-jena.de/software/sirius. Oxford University Press 2022-06-27 /pmc/articles/PMC9235503/ /pubmed/35758813 http://dx.doi.org/10.1093/bioinformatics/btac260 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle ISCB/Ismb 2022
Dührkop, Kai
Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
title Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
title_full Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
title_fullStr Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
title_full_unstemmed Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
title_short Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
title_sort deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
topic ISCB/Ismb 2022
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235503/
https://www.ncbi.nlm.nih.gov/pubmed/35758813
http://dx.doi.org/10.1093/bioinformatics/btac260
work_keys_str_mv AT duhrkopkai deepkernellearningimprovesmolecularfingerprintpredictionfromtandemmassspectra