Cargando…

MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoffmann, Martin A., Kretschmer, Fleming, Ludwig, Marcus, Böcker, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10053663/
https://www.ncbi.nlm.nih.gov/pubmed/36984753
http://dx.doi.org/10.3390/metabo13030314
_version_ 1785015467344134144
author Hoffmann, Martin A.
Kretschmer, Fleming
Ludwig, Marcus
Böcker, Sebastian
author_facet Hoffmann, Martin A.
Kretschmer, Fleming
Ludwig, Marcus
Böcker, Sebastian
author_sort Hoffmann, Martin A.
collection PubMed
description Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.
format Online
Article
Text
id pubmed-10053663
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100536632023-03-30 MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem Hoffmann, Martin A. Kretschmer, Fleming Ludwig, Marcus Böcker, Sebastian Metabolites Article Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation. MDPI 2023-02-21 /pmc/articles/PMC10053663/ /pubmed/36984753 http://dx.doi.org/10.3390/metabo13030314 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hoffmann, Martin A.
Kretschmer, Fleming
Ludwig, Marcus
Böcker, Sebastian
MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
title MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
title_full MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
title_fullStr MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
title_full_unstemmed MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
title_short MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
title_sort mad hatter correctly annotates 98% of small molecule tandem mass spectra searching in pubchem
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10053663/
https://www.ncbi.nlm.nih.gov/pubmed/36984753
http://dx.doi.org/10.3390/metabo13030314
work_keys_str_mv AT hoffmannmartina madhattercorrectlyannotates98ofsmallmoleculetandemmassspectrasearchinginpubchem
AT kretschmerfleming madhattercorrectlyannotates98ofsmallmoleculetandemmassspectrasearchinginpubchem
AT ludwigmarcus madhattercorrectlyannotates98ofsmallmoleculetandemmassspectrasearchinginpubchem
AT bockersebastian madhattercorrectlyannotates98ofsmallmoleculetandemmassspectrasearchinginpubchem