Cargando…

Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics

When analyzing large datasets from high-throughput technologies, researchers often encounter missing quantitative measurements, which are particularly frequent in metabolomics datasets. Metabolomics, the comprehensive profiling of metabolite abundances, are typically measured using mass spectrometry...

Descripción completa

Detalles Bibliográficos
Autores principales: Dekermanjian, Jonathan P., Shaddox, Elin, Nandy, Debmalya, Ghosh, Debashis, Kechris, Katerina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9109373/
https://www.ncbi.nlm.nih.gov/pubmed/35578165
http://dx.doi.org/10.1186/s12859-022-04659-1
_version_ 1784708886359441408
author Dekermanjian, Jonathan P.
Shaddox, Elin
Nandy, Debmalya
Ghosh, Debashis
Kechris, Katerina
author_facet Dekermanjian, Jonathan P.
Shaddox, Elin
Nandy, Debmalya
Ghosh, Debashis
Kechris, Katerina
author_sort Dekermanjian, Jonathan P.
collection PubMed
description When analyzing large datasets from high-throughput technologies, researchers often encounter missing quantitative measurements, which are particularly frequent in metabolomics datasets. Metabolomics, the comprehensive profiling of metabolite abundances, are typically measured using mass spectrometry technologies that often introduce missingness via multiple mechanisms: (1) the metabolite signal may be smaller than the instrument limit of detection; (2) the conditions under which the data are collected and processed may lead to missing values; (3) missing values can be introduced randomly. Missingness resulting from mechanism (1) would be classified as Missing Not At Random (MNAR), that from mechanism (2) would be Missing At Random (MAR), and that from mechanism (3) would be classified as Missing Completely At Random (MCAR). Two common approaches for handling missing data are the following: (1) omit missing data from the analysis; (2) impute the missing values. Both approaches may introduce bias and reduce statistical power in downstream analyses such as testing metabolite associations with clinical variables. Further, standard imputation methods in metabolomics often ignore the mechanisms causing missingness and inaccurately estimate missing values within a data set. We propose a mechanism-aware imputation algorithm that leverages a two-step approach in imputing missing values. First, we use a random forest classifier to classify the missing mechanism for each missing value in the data set. Second, we impute each missing value using imputation algorithms that are specific to the predicted missingness mechanism (i.e., MAR/MCAR or MNAR). Using complete data, we conducted simulations, where we imposed different missingness patterns within the data and tested the performance of combinations of imputation algorithms. Our proposed algorithm provided imputations closer to the original data than those using only one imputation algorithm for all the missing values. Consequently, our two-step approach was able to reduce bias for improved downstream analyses. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04659-1.
format Online
Article
Text
id pubmed-9109373
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91093732022-05-17 Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics Dekermanjian, Jonathan P. Shaddox, Elin Nandy, Debmalya Ghosh, Debashis Kechris, Katerina BMC Bioinformatics Research When analyzing large datasets from high-throughput technologies, researchers often encounter missing quantitative measurements, which are particularly frequent in metabolomics datasets. Metabolomics, the comprehensive profiling of metabolite abundances, are typically measured using mass spectrometry technologies that often introduce missingness via multiple mechanisms: (1) the metabolite signal may be smaller than the instrument limit of detection; (2) the conditions under which the data are collected and processed may lead to missing values; (3) missing values can be introduced randomly. Missingness resulting from mechanism (1) would be classified as Missing Not At Random (MNAR), that from mechanism (2) would be Missing At Random (MAR), and that from mechanism (3) would be classified as Missing Completely At Random (MCAR). Two common approaches for handling missing data are the following: (1) omit missing data from the analysis; (2) impute the missing values. Both approaches may introduce bias and reduce statistical power in downstream analyses such as testing metabolite associations with clinical variables. Further, standard imputation methods in metabolomics often ignore the mechanisms causing missingness and inaccurately estimate missing values within a data set. We propose a mechanism-aware imputation algorithm that leverages a two-step approach in imputing missing values. First, we use a random forest classifier to classify the missing mechanism for each missing value in the data set. Second, we impute each missing value using imputation algorithms that are specific to the predicted missingness mechanism (i.e., MAR/MCAR or MNAR). Using complete data, we conducted simulations, where we imposed different missingness patterns within the data and tested the performance of combinations of imputation algorithms. Our proposed algorithm provided imputations closer to the original data than those using only one imputation algorithm for all the missing values. Consequently, our two-step approach was able to reduce bias for improved downstream analyses. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04659-1. BioMed Central 2022-05-16 /pmc/articles/PMC9109373/ /pubmed/35578165 http://dx.doi.org/10.1186/s12859-022-04659-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dekermanjian, Jonathan P.
Shaddox, Elin
Nandy, Debmalya
Ghosh, Debashis
Kechris, Katerina
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
title Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
title_full Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
title_fullStr Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
title_full_unstemmed Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
title_short Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
title_sort mechanism-aware imputation: a two-step approach in handling missing values in metabolomics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9109373/
https://www.ncbi.nlm.nih.gov/pubmed/35578165
http://dx.doi.org/10.1186/s12859-022-04659-1
work_keys_str_mv AT dekermanjianjonathanp mechanismawareimputationatwostepapproachinhandlingmissingvaluesinmetabolomics
AT shaddoxelin mechanismawareimputationatwostepapproachinhandlingmissingvaluesinmetabolomics
AT nandydebmalya mechanismawareimputationatwostepapproachinhandlingmissingvaluesinmetabolomics
AT ghoshdebashis mechanismawareimputationatwostepapproachinhandlingmissingvaluesinmetabolomics
AT kechriskaterina mechanismawareimputationatwostepapproachinhandlingmissingvaluesinmetabolomics