Cargando…

Comparative assessment and novel strategy on methods for imputing proteomics data

Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing va...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shen, Minjie, Chang, Yi-Tan, Wu, Chiung-Ting, Parker, Sarah J., Saylor, Georgia, Wang, Yizhi, Yu, Guoqiang, Van Eyk, Jennifer E., Clarke, Robert, Herrington, David M., Wang, Yue
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/ https://www.ncbi.nlm.nih.gov/pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0

_version_	1784636928619970560
author	Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue
author_facet	Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue
author_sort	Shen, Minjie
collection	PubMed
description	Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.
format	Online Article Text
id	pubmed-8776850
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-87768502022-01-24 Comparative assessment and novel strategy on methods for imputing proteomics data Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue Sci Rep Article Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach. Nature Publishing Group UK 2022-01-20 /pmc/articles/PMC8776850/ /pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue Comparative assessment and novel strategy on methods for imputing proteomics data
title	Comparative assessment and novel strategy on methods for imputing proteomics data
title_full	Comparative assessment and novel strategy on methods for imputing proteomics data
title_fullStr	Comparative assessment and novel strategy on methods for imputing proteomics data
title_full_unstemmed	Comparative assessment and novel strategy on methods for imputing proteomics data
title_short	Comparative assessment and novel strategy on methods for imputing proteomics data
title_sort	comparative assessment and novel strategy on methods for imputing proteomics data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/ https://www.ncbi.nlm.nih.gov/pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0
work_keys_str_mv	AT shenminjie comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT changyitan comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT wuchiungting comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT parkersarahj comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT saylorgeorgia comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT wangyizhi comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT yuguoqiang comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT vaneykjennifere comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT clarkerobert comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT herringtondavidm comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT wangyue comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata

Comparative assessment and novel strategy on methods for imputing proteomics data

Ejemplares similares