Cargando…
A Workflow for Missing Values Imputation of Untargeted Metabolomics Data
Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provide...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7761057/ https://www.ncbi.nlm.nih.gov/pubmed/33256233 http://dx.doi.org/10.3390/metabo10120486 |
_version_ | 1783627478936322048 |
---|---|
author | Faquih, Tariq van Smeden, Maarten Luo, Jiao le Cessie, Saskia Kastenmüller, Gabi Krumsiek, Jan Noordam, Raymond van Heemst, Diana Rosendaal, Frits R. van Hylckama Vlieg, Astrid Willems van Dijk, Ko Mook-Kanamori, Dennis O. |
author_facet | Faquih, Tariq van Smeden, Maarten Luo, Jiao le Cessie, Saskia Kastenmüller, Gabi Krumsiek, Jan Noordam, Raymond van Heemst, Diana Rosendaal, Frits R. van Hylckama Vlieg, Astrid Willems van Dijk, Ko Mook-Kanamori, Dennis O. |
author_sort | Faquih, Tariq |
collection | PubMed |
description | Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods. |
format | Online Article Text |
id | pubmed-7761057 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-77610572020-12-26 A Workflow for Missing Values Imputation of Untargeted Metabolomics Data Faquih, Tariq van Smeden, Maarten Luo, Jiao le Cessie, Saskia Kastenmüller, Gabi Krumsiek, Jan Noordam, Raymond van Heemst, Diana Rosendaal, Frits R. van Hylckama Vlieg, Astrid Willems van Dijk, Ko Mook-Kanamori, Dennis O. Metabolites Article Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods. MDPI 2020-11-26 /pmc/articles/PMC7761057/ /pubmed/33256233 http://dx.doi.org/10.3390/metabo10120486 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Faquih, Tariq van Smeden, Maarten Luo, Jiao le Cessie, Saskia Kastenmüller, Gabi Krumsiek, Jan Noordam, Raymond van Heemst, Diana Rosendaal, Frits R. van Hylckama Vlieg, Astrid Willems van Dijk, Ko Mook-Kanamori, Dennis O. A Workflow for Missing Values Imputation of Untargeted Metabolomics Data |
title | A Workflow for Missing Values Imputation of Untargeted Metabolomics Data |
title_full | A Workflow for Missing Values Imputation of Untargeted Metabolomics Data |
title_fullStr | A Workflow for Missing Values Imputation of Untargeted Metabolomics Data |
title_full_unstemmed | A Workflow for Missing Values Imputation of Untargeted Metabolomics Data |
title_short | A Workflow for Missing Values Imputation of Untargeted Metabolomics Data |
title_sort | workflow for missing values imputation of untargeted metabolomics data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7761057/ https://www.ncbi.nlm.nih.gov/pubmed/33256233 http://dx.doi.org/10.3390/metabo10120486 |
work_keys_str_mv | AT faquihtariq aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT vansmedenmaarten aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT luojiao aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT lecessiesaskia aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT kastenmullergabi aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT krumsiekjan aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT noordamraymond aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT vanheemstdiana aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT rosendaalfritsr aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT vanhylckamavliegastrid aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT willemsvandijkko aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT mookkanamoridenniso aworkflowformissingvaluesimputationofuntargetedmetabolomicsdata AT faquihtariq workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT vansmedenmaarten workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT luojiao workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT lecessiesaskia workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT kastenmullergabi workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT krumsiekjan workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT noordamraymond workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT vanheemstdiana workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT rosendaalfritsr workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT vanhylckamavliegastrid workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT willemsvandijkko workflowformissingvaluesimputationofuntargetedmetabolomicsdata AT mookkanamoridenniso workflowformissingvaluesimputationofuntargetedmetabolomicsdata |