Cargando…
Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We e...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9144635/ https://www.ncbi.nlm.nih.gov/pubmed/35629933 http://dx.doi.org/10.3390/metabo12050429 |
_version_ | 1784716096352288768 |
---|---|
author | Ampong, Isaac Zimmerman, Kip D. Nathanielsz, Peter W. Cox, Laura A. Olivier, Michael |
author_facet | Ampong, Isaac Zimmerman, Kip D. Nathanielsz, Peter W. Cox, Laura A. Olivier, Michael |
author_sort | Ampong, Isaac |
collection | PubMed |
description | Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and p-values. |
format | Online Article Text |
id | pubmed-9144635 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-91446352022-05-29 Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data Ampong, Isaac Zimmerman, Kip D. Nathanielsz, Peter W. Cox, Laura A. Olivier, Michael Metabolites Article Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and p-values. MDPI 2022-05-11 /pmc/articles/PMC9144635/ /pubmed/35629933 http://dx.doi.org/10.3390/metabo12050429 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ampong, Isaac Zimmerman, Kip D. Nathanielsz, Peter W. Cox, Laura A. Olivier, Michael Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data |
title | Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data |
title_full | Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data |
title_fullStr | Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data |
title_full_unstemmed | Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data |
title_short | Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data |
title_sort | optimization of imputation strategies for high-resolution gas chromatography–mass spectrometry (hr gc–ms) metabolomics data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9144635/ https://www.ncbi.nlm.nih.gov/pubmed/35629933 http://dx.doi.org/10.3390/metabo12050429 |
work_keys_str_mv | AT ampongisaac optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata AT zimmermankipd optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata AT nathanielszpeterw optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata AT coxlauraa optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata AT oliviermichael optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata |