Cargando…

NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data

In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputat...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Jingjing, Wang, Yuanshan, Xu, Xiangnan, Cheng, Kian-Kai, Raftery, Daniel, Dong, Jiyang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510447/
https://www.ncbi.nlm.nih.gov/pubmed/34641330
http://dx.doi.org/10.3390/molecules26195787
_version_ 1784582574471905280
author Xu, Jingjing
Wang, Yuanshan
Xu, Xiangnan
Cheng, Kian-Kai
Raftery, Daniel
Dong, Jiyang
author_facet Xu, Jingjing
Wang, Yuanshan
Xu, Xiangnan
Cheng, Kian-Kai
Raftery, Daniel
Dong, Jiyang
author_sort Xu, Jingjing
collection PubMed
description In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data.
format Online
Article
Text
id pubmed-8510447
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85104472021-10-13 NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data Xu, Jingjing Wang, Yuanshan Xu, Xiangnan Cheng, Kian-Kai Raftery, Daniel Dong, Jiyang Molecules Article In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data. MDPI 2021-09-24 /pmc/articles/PMC8510447/ /pubmed/34641330 http://dx.doi.org/10.3390/molecules26195787 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xu, Jingjing
Wang, Yuanshan
Xu, Xiangnan
Cheng, Kian-Kai
Raftery, Daniel
Dong, Jiyang
NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_full NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_fullStr NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_full_unstemmed NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_short NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_sort nmf-based approach for missing values imputation of mass spectrometry metabolomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510447/
https://www.ncbi.nlm.nih.gov/pubmed/34641330
http://dx.doi.org/10.3390/molecules26195787
work_keys_str_mv AT xujingjing nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT wangyuanshan nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT xuxiangnan nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT chengkiankai nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT rafterydaniel nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT dongjiyang nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata