Cargando…

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Runmin, Wang, Jingye, Su, Mingming, Jia, Erik, Chen, Shaoqiu, Chen, Tianlu, Ni, Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5766532/
https://www.ncbi.nlm.nih.gov/pubmed/29330539
http://dx.doi.org/10.1038/s41598-017-19120-0
_version_ 1783292370185355264
author Wei, Runmin
Wang, Jingye
Su, Mingming
Jia, Erik
Chen, Shaoqiu
Chen, Tianlu
Ni, Yan
author_facet Wei, Runmin
Wang, Jingye
Su, Mingming
Jia, Erik
Chen, Shaoqiu
Chen, Tianlu
Ni, Yan
author_sort Wei, Runmin
collection PubMed
description Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student’s t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics (https://metabolomics.cc.hawaii.edu/software/MetImp/).
format Online
Article
Text
id pubmed-5766532
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-57665322018-01-17 Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data Wei, Runmin Wang, Jingye Su, Mingming Jia, Erik Chen, Shaoqiu Chen, Tianlu Ni, Yan Sci Rep Article Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student’s t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics (https://metabolomics.cc.hawaii.edu/software/MetImp/). Nature Publishing Group UK 2018-01-12 /pmc/articles/PMC5766532/ /pubmed/29330539 http://dx.doi.org/10.1038/s41598-017-19120-0 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Wei, Runmin
Wang, Jingye
Su, Mingming
Jia, Erik
Chen, Shaoqiu
Chen, Tianlu
Ni, Yan
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
title Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
title_full Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
title_fullStr Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
title_full_unstemmed Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
title_short Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
title_sort missing value imputation approach for mass spectrometry-based metabolomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5766532/
https://www.ncbi.nlm.nih.gov/pubmed/29330539
http://dx.doi.org/10.1038/s41598-017-19120-0
work_keys_str_mv AT weirunmin missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata
AT wangjingye missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata
AT sumingming missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata
AT jiaerik missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata
AT chenshaoqiu missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata
AT chentianlu missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata
AT niyan missingvalueimputationapproachformassspectrometrybasedmetabolomicsdata