Cargando…

GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis

MOTIVATION: Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the dete...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Qian, Fisher, Kate, Meng, Wenjun, Fang, Bin, Welsh, Eric, Haura, Eric B, Koomen, John M, Eschrich, Steven A, Fridley, Brooke L, Chen, Y Ann
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6956786/ https://www.ncbi.nlm.nih.gov/pubmed/31199438 http://dx.doi.org/10.1093/bioinformatics/btz488

_version_	1783487204375396352
author	Li, Qian Fisher, Kate Meng, Wenjun Fang, Bin Welsh, Eric Haura, Eric B Koomen, John M Eschrich, Steven A Fridley, Brooke L Chen, Y Ann
author_facet	Li, Qian Fisher, Kate Meng, Wenjun Fang, Bin Welsh, Eric Haura, Eric B Koomen, John M Eschrich, Steven A Fridley, Brooke L Chen, Y Ann
author_sort	Li, Qian
collection	PubMed
description	MOTIVATION: Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the detection limit. If the missing pattern deviates from the assumption, it may lead to biased results. Hence, we investigate the missing patterns in free mass spectrometry data and develop an omnibus approach GMSimpute, to allow effective imputation accommodating different missing patterns. RESULTS: Three proteomics datasets and one metabolomics dataset indicate missing values could be a mixture of abundance-dependent and abundance-independent missingness. We assess the performance of GMSimpute using simulated data (with a wide range of 80 missing patterns) and metabolomics data from the Cancer Genome Atlas breast cancer and clear cell renal cell carcinoma studies. Using Pearson correlation and normalized root mean square errors between the true and imputed abundance, we compare its performance to K-nearest neighbors’ type approaches, Random Forest, GSimp, a model-based method implemented in DanteR and minimum values. The results indicate GMSimpute provides higher accuracy in imputation and exhibits stable performance across different missing patterns. In addition, GMSimpute is able to identify the features in downstream differential expression analysis with high accuracy when applied to the Cancer Genome Atlas datasets. AVAILABILITY AND IMPLEMENTATION: GMSimpute is on CRAN: https://cran.r-project.org/web/packages/GMSimpute/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6956786
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-69567862020-01-16 GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis Li, Qian Fisher, Kate Meng, Wenjun Fang, Bin Welsh, Eric Haura, Eric B Koomen, John M Eschrich, Steven A Fridley, Brooke L Chen, Y Ann Bioinformatics Original Papers MOTIVATION: Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the detection limit. If the missing pattern deviates from the assumption, it may lead to biased results. Hence, we investigate the missing patterns in free mass spectrometry data and develop an omnibus approach GMSimpute, to allow effective imputation accommodating different missing patterns. RESULTS: Three proteomics datasets and one metabolomics dataset indicate missing values could be a mixture of abundance-dependent and abundance-independent missingness. We assess the performance of GMSimpute using simulated data (with a wide range of 80 missing patterns) and metabolomics data from the Cancer Genome Atlas breast cancer and clear cell renal cell carcinoma studies. Using Pearson correlation and normalized root mean square errors between the true and imputed abundance, we compare its performance to K-nearest neighbors’ type approaches, Random Forest, GSimp, a model-based method implemented in DanteR and minimum values. The results indicate GMSimpute provides higher accuracy in imputation and exhibits stable performance across different missing patterns. In addition, GMSimpute is able to identify the features in downstream differential expression analysis with high accuracy when applied to the Cancer Genome Atlas datasets. AVAILABILITY AND IMPLEMENTATION: GMSimpute is on CRAN: https://cran.r-project.org/web/packages/GMSimpute/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-01-01 2019-06-14 /pmc/articles/PMC6956786/ /pubmed/31199438 http://dx.doi.org/10.1093/bioinformatics/btz488 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Li, Qian Fisher, Kate Meng, Wenjun Fang, Bin Welsh, Eric Haura, Eric B Koomen, John M Eschrich, Steven A Fridley, Brooke L Chen, Y Ann GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis
title	GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis
title_full	GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis
title_fullStr	GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis
title_full_unstemmed	GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis
title_short	GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis
title_sort	gmsimpute: a generalized two-step lasso approach to impute missing values in label-free mass spectrum analysis
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6956786/ https://www.ncbi.nlm.nih.gov/pubmed/31199438 http://dx.doi.org/10.1093/bioinformatics/btz488
work_keys_str_mv	AT liqian gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT fisherkate gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT mengwenjun gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT fangbin gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT welsheric gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT hauraericb gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT koomenjohnm gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT eschrichstevena gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT fridleybrookel gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis AT chenyann gmsimputeageneralizedtwosteplassoapproachtoimputemissingvaluesinlabelfreemassspectrumanalysis

GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis

Ejemplares similares