Cargando…

Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this “molecular snapshot” is only as informative as the number...

Descripción completa

Detalles Bibliográficos
Autores principales: Degnan, David J., Flores, Javier E., Brayfindley, Eva R., Paurus, Vanessa L., Webb-Robertson, Bobbie-Jo M., Clendinen, Chaevien S., Bramer, Lisa M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10608912/
https://www.ncbi.nlm.nih.gov/pubmed/37887426
http://dx.doi.org/10.3390/metabo13101101
_version_ 1785127889801314304
author Degnan, David J.
Flores, Javier E.
Brayfindley, Eva R.
Paurus, Vanessa L.
Webb-Robertson, Bobbie-Jo M.
Clendinen, Chaevien S.
Bramer, Lisa M.
author_facet Degnan, David J.
Flores, Javier E.
Brayfindley, Eva R.
Paurus, Vanessa L.
Webb-Robertson, Bobbie-Jo M.
Clendinen, Chaevien S.
Bramer, Lisa M.
author_sort Degnan, David J.
collection PubMed
description Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this “molecular snapshot” is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.
format Online
Article
Text
id pubmed-10608912
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106089122023-10-28 Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification Degnan, David J. Flores, Javier E. Brayfindley, Eva R. Paurus, Vanessa L. Webb-Robertson, Bobbie-Jo M. Clendinen, Chaevien S. Bramer, Lisa M. Metabolites Article Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this “molecular snapshot” is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows. MDPI 2023-10-21 /pmc/articles/PMC10608912/ /pubmed/37887426 http://dx.doi.org/10.3390/metabo13101101 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Degnan, David J.
Flores, Javier E.
Brayfindley, Eva R.
Paurus, Vanessa L.
Webb-Robertson, Bobbie-Jo M.
Clendinen, Chaevien S.
Bramer, Lisa M.
Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
title Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
title_full Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
title_fullStr Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
title_full_unstemmed Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
title_short Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
title_sort characterizing families of spectral similarity scores and their use cases for gas chromatography–mass spectrometry small molecule identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10608912/
https://www.ncbi.nlm.nih.gov/pubmed/37887426
http://dx.doi.org/10.3390/metabo13101101
work_keys_str_mv AT degnandavidj characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification
AT floresjaviere characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification
AT brayfindleyevar characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification
AT paurusvanessal characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification
AT webbrobertsonbobbiejom characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification
AT clendinenchaeviens characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification
AT bramerlisam characterizingfamiliesofspectralsimilarityscoresandtheirusecasesforgaschromatographymassspectrometrysmallmoleculeidentification