Cargando…

Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics

BACKGROUND: Chromatographic peakpicking continues to represent a significant bottleneck in automated LC–MS workflows. Uncontrolled false discovery rates and the lack of manually-calibrated quality metrics require researchers to visually evaluate individual peaks, requiring large amounts of time and...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumler, William, Hazelton, Bryna J., Ingalls, Anitra E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10612323/
https://www.ncbi.nlm.nih.gov/pubmed/37891484
http://dx.doi.org/10.1186/s12859-023-05533-4
_version_ 1785128678565347328
author Kumler, William
Hazelton, Bryna J.
Ingalls, Anitra E.
author_facet Kumler, William
Hazelton, Bryna J.
Ingalls, Anitra E.
author_sort Kumler, William
collection PubMed
description BACKGROUND: Chromatographic peakpicking continues to represent a significant bottleneck in automated LC–MS workflows. Uncontrolled false discovery rates and the lack of manually-calibrated quality metrics require researchers to visually evaluate individual peaks, requiring large amounts of time and breaking replicability. This problem is exacerbated in noisy environmental datasets and for novel separation methods such as hydrophilic interaction columns in metabolomics, creating a demand for a simple, intuitive, and robust metric of peak quality. RESULTS: Here, we manually labeled four HILIC oceanographic particulate metabolite datasets to assess the performance of individual peak quality metrics. We used these datasets to construct a predictive model calibrated to the likelihood that visual inspection by an MS expert would include a given mass feature in the downstream analysis. We implemented two novel peak quality metrics, a custom signal-to-noise metric and a test of similarity to a bell curve, both calculated from the raw data in the extracted ion chromatogram, and found that these outperformed existing measurements of peak quality. A simple logistic regression model built on two metrics reduced the fraction of false positives in the analysis from 70–80% down to 1–5% and showed minimal overfitting when applied to novel datasets. We then explored the implications of this quality thresholding on the conclusions obtained by the downstream analysis and found that while only 10% of the variance in the dataset could be explained by depth in the default output from the peakpicker, approximately 40% of the variance was explained when restricted to high-quality peaks alone. CONCLUSIONS: We conclude that the poor performance of peakpicking algorithms significantly reduces the power of both univariate and multivariate statistical analyses to detect environmental differences. We demonstrate that simple models built on intuitive metrics and derived from the raw data are more robust and can outperform more complex models when applied to new data. Finally, we show that in properly curated datasets, depth is a major driver of variability in the marine microbial metabolome and identify several interesting metabolite trends for future investigation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05533-4.
format Online
Article
Text
id pubmed-10612323
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106123232023-10-29 Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics Kumler, William Hazelton, Bryna J. Ingalls, Anitra E. BMC Bioinformatics Research BACKGROUND: Chromatographic peakpicking continues to represent a significant bottleneck in automated LC–MS workflows. Uncontrolled false discovery rates and the lack of manually-calibrated quality metrics require researchers to visually evaluate individual peaks, requiring large amounts of time and breaking replicability. This problem is exacerbated in noisy environmental datasets and for novel separation methods such as hydrophilic interaction columns in metabolomics, creating a demand for a simple, intuitive, and robust metric of peak quality. RESULTS: Here, we manually labeled four HILIC oceanographic particulate metabolite datasets to assess the performance of individual peak quality metrics. We used these datasets to construct a predictive model calibrated to the likelihood that visual inspection by an MS expert would include a given mass feature in the downstream analysis. We implemented two novel peak quality metrics, a custom signal-to-noise metric and a test of similarity to a bell curve, both calculated from the raw data in the extracted ion chromatogram, and found that these outperformed existing measurements of peak quality. A simple logistic regression model built on two metrics reduced the fraction of false positives in the analysis from 70–80% down to 1–5% and showed minimal overfitting when applied to novel datasets. We then explored the implications of this quality thresholding on the conclusions obtained by the downstream analysis and found that while only 10% of the variance in the dataset could be explained by depth in the default output from the peakpicker, approximately 40% of the variance was explained when restricted to high-quality peaks alone. CONCLUSIONS: We conclude that the poor performance of peakpicking algorithms significantly reduces the power of both univariate and multivariate statistical analyses to detect environmental differences. We demonstrate that simple models built on intuitive metrics and derived from the raw data are more robust and can outperform more complex models when applied to new data. Finally, we show that in properly curated datasets, depth is a major driver of variability in the marine microbial metabolome and identify several interesting metabolite trends for future investigation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05533-4. BioMed Central 2023-10-28 /pmc/articles/PMC10612323/ /pubmed/37891484 http://dx.doi.org/10.1186/s12859-023-05533-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kumler, William
Hazelton, Bryna J.
Ingalls, Anitra E.
Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
title Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
title_full Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
title_fullStr Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
title_full_unstemmed Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
title_short Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
title_sort picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10612323/
https://www.ncbi.nlm.nih.gov/pubmed/37891484
http://dx.doi.org/10.1186/s12859-023-05533-4
work_keys_str_mv AT kumlerwilliam pickywithpeakpickingassessingchromatographicpeakqualitywithsimplemetricsinmetabolomics
AT hazeltonbrynaj pickywithpeakpickingassessingchromatographicpeakqualitywithsimplemetricsinmetabolomics
AT ingallsanitrae pickywithpeakpickingassessingchromatographicpeakqualitywithsimplemetricsinmetabolomics