Cargando…

Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data

Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We e...

Descripción completa

Detalles Bibliográficos
Autores principales: Ampong, Isaac, Zimmerman, Kip D., Nathanielsz, Peter W., Cox, Laura A., Olivier, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9144635/
https://www.ncbi.nlm.nih.gov/pubmed/35629933
http://dx.doi.org/10.3390/metabo12050429
_version_ 1784716096352288768
author Ampong, Isaac
Zimmerman, Kip D.
Nathanielsz, Peter W.
Cox, Laura A.
Olivier, Michael
author_facet Ampong, Isaac
Zimmerman, Kip D.
Nathanielsz, Peter W.
Cox, Laura A.
Olivier, Michael
author_sort Ampong, Isaac
collection PubMed
description Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and p-values.
format Online
Article
Text
id pubmed-9144635
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91446352022-05-29 Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data Ampong, Isaac Zimmerman, Kip D. Nathanielsz, Peter W. Cox, Laura A. Olivier, Michael Metabolites Article Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and p-values. MDPI 2022-05-11 /pmc/articles/PMC9144635/ /pubmed/35629933 http://dx.doi.org/10.3390/metabo12050429 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ampong, Isaac
Zimmerman, Kip D.
Nathanielsz, Peter W.
Cox, Laura A.
Olivier, Michael
Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_full Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_fullStr Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_full_unstemmed Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_short Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_sort optimization of imputation strategies for high-resolution gas chromatography–mass spectrometry (hr gc–ms) metabolomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9144635/
https://www.ncbi.nlm.nih.gov/pubmed/35629933
http://dx.doi.org/10.3390/metabo12050429
work_keys_str_mv AT ampongisaac optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT zimmermankipd optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT nathanielszpeterw optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT coxlauraa optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT oliviermichael optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata