Cargando…

A Workflow for Missing Values Imputation of Untargeted Metabolomics Data

Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provide...

Descripción completa

Detalles Bibliográficos
Autores principales: Faquih, Tariq, van Smeden, Maarten, Luo, Jiao, le Cessie, Saskia, Kastenmüller, Gabi, Krumsiek, Jan, Noordam, Raymond, van Heemst, Diana, Rosendaal, Frits R., van Hylckama Vlieg, Astrid, Willems van Dijk, Ko, Mook-Kanamori, Dennis O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7761057/
https://www.ncbi.nlm.nih.gov/pubmed/33256233
http://dx.doi.org/10.3390/metabo10120486
_version_ 1783627478936322048
author Faquih, Tariq
van Smeden, Maarten
Luo, Jiao
le Cessie, Saskia
Kastenmüller, Gabi
Krumsiek, Jan
Noordam, Raymond
van Heemst, Diana
Rosendaal, Frits R.
van Hylckama Vlieg, Astrid
Willems van Dijk, Ko
Mook-Kanamori, Dennis O.
author_facet Faquih, Tariq
van Smeden, Maarten
Luo, Jiao
le Cessie, Saskia
Kastenmüller, Gabi
Krumsiek, Jan
Noordam, Raymond
van Heemst, Diana
Rosendaal, Frits R.
van Hylckama Vlieg, Astrid
Willems van Dijk, Ko
Mook-Kanamori, Dennis O.
author_sort Faquih, Tariq
collection PubMed
description Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods.
format Online
Article
Text
id pubmed-7761057
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77610572020-12-26 A Workflow for Missing Values Imputation of Untargeted Metabolomics Data Faquih, Tariq van Smeden, Maarten Luo, Jiao le Cessie, Saskia Kastenmüller, Gabi Krumsiek, Jan Noordam, Raymond van Heemst, Diana Rosendaal, Frits R. van Hylckama Vlieg, Astrid Willems van Dijk, Ko Mook-Kanamori, Dennis O. Metabolites Article Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods. MDPI 2020-11-26 /pmc/articles/PMC7761057/ /pubmed/33256233 http://dx.doi.org/10.3390/metabo10120486 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Faquih, Tariq
van Smeden, Maarten
Luo, Jiao
le Cessie, Saskia
Kastenmüller, Gabi
Krumsiek, Jan
Noordam, Raymond
van Heemst, Diana
Rosendaal, Frits R.
van Hylckama Vlieg, Astrid
Willems van Dijk, Ko
Mook-Kanamori, Dennis O.
A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
title A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
title_full A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
title_fullStr A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
title_full_unstemmed A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
title_short A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
title_sort workflow for missing values imputation of untargeted metabolomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7761057/
https://www.ncbi.nlm.nih.gov/pubmed/33256233
http://dx.doi.org/10.3390/metabo10120486
work_keys_str_mv AT faquihtariq aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT vansmedenmaarten aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT luojiao aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT lecessiesaskia aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT kastenmullergabi aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT krumsiekjan aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT noordamraymond aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT vanheemstdiana aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT rosendaalfritsr aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT vanhylckamavliegastrid aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT willemsvandijkko aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT mookkanamoridenniso aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT faquihtariq workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT vansmedenmaarten workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT luojiao workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT lecessiesaskia workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT kastenmullergabi workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT krumsiekjan workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT noordamraymond workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT vanheemstdiana workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT rosendaalfritsr workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT vanhylckamavliegastrid workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT willemsvandijkko workflowformissingvaluesimputationofuntargetedmetabolomicsdata
AT mookkanamoridenniso workflowformissingvaluesimputationofuntargetedmetabolomicsdata