Cargando…

Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication

[Image: see text] Missing data is a significant issue in metabolomics that is often neglected when conducting data preprocessing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, Trenton J., Firzli, Tarek R., Higgins Keppler, Emily A., Richardson, Matthew, Bean, Heather D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9369014/
https://www.ncbi.nlm.nih.gov/pubmed/35881554
http://dx.doi.org/10.1021/acs.analchem.1c04093
_version_ 1784766325837529088
author Davis, Trenton J.
Firzli, Tarek R.
Higgins Keppler, Emily A.
Richardson, Matthew
Bean, Heather D.
author_facet Davis, Trenton J.
Firzli, Tarek R.
Higgins Keppler, Emily A.
Richardson, Matthew
Bean, Heather D.
author_sort Davis, Trenton J.
collection PubMed
description [Image: see text] Missing data is a significant issue in metabolomics that is often neglected when conducting data preprocessing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metabolomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatography (GC × GC) data sets. We also present these goals in the context of experimental replication whereby imputation is conducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two GC × GC data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approaches (Bayesian principal component analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially important features in downstream analyses for biomarker discovery.
format Online
Article
Text
id pubmed-9369014
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-93690142023-07-26 Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication Davis, Trenton J. Firzli, Tarek R. Higgins Keppler, Emily A. Richardson, Matthew Bean, Heather D. Anal Chem [Image: see text] Missing data is a significant issue in metabolomics that is often neglected when conducting data preprocessing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metabolomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatography (GC × GC) data sets. We also present these goals in the context of experimental replication whereby imputation is conducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two GC × GC data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approaches (Bayesian principal component analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially important features in downstream analyses for biomarker discovery. American Chemical Society 2022-07-26 2022-08-09 /pmc/articles/PMC9369014/ /pubmed/35881554 http://dx.doi.org/10.1021/acs.analchem.1c04093 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Davis, Trenton J.
Firzli, Tarek R.
Higgins Keppler, Emily A.
Richardson, Matthew
Bean, Heather D.
Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication
title Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication
title_full Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication
title_fullStr Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication
title_full_unstemmed Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication
title_short Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication
title_sort addressing missing data in gc × gc metabolomics: identifying missingness type and evaluating the impact of imputation methods on experimental replication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9369014/
https://www.ncbi.nlm.nih.gov/pubmed/35881554
http://dx.doi.org/10.1021/acs.analchem.1c04093
work_keys_str_mv AT davistrentonj addressingmissingdataingcgcmetabolomicsidentifyingmissingnesstypeandevaluatingtheimpactofimputationmethodsonexperimentalreplication
AT firzlitarekr addressingmissingdataingcgcmetabolomicsidentifyingmissingnesstypeandevaluatingtheimpactofimputationmethodsonexperimentalreplication
AT higginskeppleremilya addressingmissingdataingcgcmetabolomicsidentifyingmissingnesstypeandevaluatingtheimpactofimputationmethodsonexperimentalreplication
AT richardsonmatthew addressingmissingdataingcgcmetabolomicsidentifyingmissingnesstypeandevaluatingtheimpactofimputationmethodsonexperimentalreplication
AT beanheatherd addressingmissingdataingcgcmetabolomicsidentifyingmissingnesstypeandevaluatingtheimpactofimputationmethodsonexperimentalreplication