Cargando…

Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry

Large-scale metabolomics assays are widely used in epidemiology for biomarker discovery and risk assessments. However, systematic errors introduced by instrumental signal drifting pose a big challenge in large-scale assays, especially for derivatization-based gas chromatography–mass spectrometry (GC...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Ying, Fan, Sili, Wohlgemuth, Gert, Fiehn, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10456436/
https://www.ncbi.nlm.nih.gov/pubmed/37623887
http://dx.doi.org/10.3390/metabo13080944
_version_ 1785096698363641856
author Zhang, Ying
Fan, Sili
Wohlgemuth, Gert
Fiehn, Oliver
author_facet Zhang, Ying
Fan, Sili
Wohlgemuth, Gert
Fiehn, Oliver
author_sort Zhang, Ying
collection PubMed
description Large-scale metabolomics assays are widely used in epidemiology for biomarker discovery and risk assessments. However, systematic errors introduced by instrumental signal drifting pose a big challenge in large-scale assays, especially for derivatization-based gas chromatography–mass spectrometry (GC–MS). Here, we compare the results of different normalization methods for a study with more than 4000 human plasma samples involved in a type 2 diabetes cohort study, in addition to 413 pooled quality control (QC) samples, 413 commercial pooled plasma samples, and a set of 25 stable isotope-labeled internal standards used for every sample. Data acquisition was conducted across 1.2 years, including seven column changes. In total, 413 pooled QC (training) and 413 BioIVT samples (validation) were used for normalization comparisons. Surprisingly, neither internal standards nor sum-based normalizations yielded median precision of less than 30% across all 563 metabolite annotations. While the machine-learning-based SERRF algorithm gave 19% median precision based on the pooled quality control samples, external cross-validation with BioIVT plasma pools yielded a median 34% relative standard deviation (RSD). We developed a new method: systematic error reduction by denoising autoencoder (SERDA). SERDA lowered the median standard deviations of the training QC samples down to 16% RSD, yielding an overall error of 19% RSD when applied to the independent BioIVT validation QC samples. This is the largest study on GC–MS metabolomics ever reported, demonstrating that technical errors can be normalized and handled effectively for this assay. SERDA was further validated on two additional large-scale GC–MS-based human plasma metabolomics studies, confirming the superior performance of SERDA over SERRF or sum normalizations.
format Online
Article
Text
id pubmed-10456436
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104564362023-08-26 Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry Zhang, Ying Fan, Sili Wohlgemuth, Gert Fiehn, Oliver Metabolites Article Large-scale metabolomics assays are widely used in epidemiology for biomarker discovery and risk assessments. However, systematic errors introduced by instrumental signal drifting pose a big challenge in large-scale assays, especially for derivatization-based gas chromatography–mass spectrometry (GC–MS). Here, we compare the results of different normalization methods for a study with more than 4000 human plasma samples involved in a type 2 diabetes cohort study, in addition to 413 pooled quality control (QC) samples, 413 commercial pooled plasma samples, and a set of 25 stable isotope-labeled internal standards used for every sample. Data acquisition was conducted across 1.2 years, including seven column changes. In total, 413 pooled QC (training) and 413 BioIVT samples (validation) were used for normalization comparisons. Surprisingly, neither internal standards nor sum-based normalizations yielded median precision of less than 30% across all 563 metabolite annotations. While the machine-learning-based SERRF algorithm gave 19% median precision based on the pooled quality control samples, external cross-validation with BioIVT plasma pools yielded a median 34% relative standard deviation (RSD). We developed a new method: systematic error reduction by denoising autoencoder (SERDA). SERDA lowered the median standard deviations of the training QC samples down to 16% RSD, yielding an overall error of 19% RSD when applied to the independent BioIVT validation QC samples. This is the largest study on GC–MS metabolomics ever reported, demonstrating that technical errors can be normalized and handled effectively for this assay. SERDA was further validated on two additional large-scale GC–MS-based human plasma metabolomics studies, confirming the superior performance of SERDA over SERRF or sum normalizations. MDPI 2023-08-13 /pmc/articles/PMC10456436/ /pubmed/37623887 http://dx.doi.org/10.3390/metabo13080944 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Ying
Fan, Sili
Wohlgemuth, Gert
Fiehn, Oliver
Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
title Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
title_full Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
title_fullStr Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
title_full_unstemmed Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
title_short Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
title_sort denoising autoencoder normalization for large-scale untargeted metabolomics by gas chromatography–mass spectrometry
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10456436/
https://www.ncbi.nlm.nih.gov/pubmed/37623887
http://dx.doi.org/10.3390/metabo13080944
work_keys_str_mv AT zhangying denoisingautoencodernormalizationforlargescaleuntargetedmetabolomicsbygaschromatographymassspectrometry
AT fansili denoisingautoencodernormalizationforlargescaleuntargetedmetabolomicsbygaschromatographymassspectrometry
AT wohlgemuthgert denoisingautoencodernormalizationforlargescaleuntargetedmetabolomicsbygaschromatographymassspectrometry
AT fiehnoliver denoisingautoencodernormalizationforlargescaleuntargetedmetabolomicsbygaschromatographymassspectrometry