Cargando…

Comparative assessment and novel strategy on methods for imputing proteomics data

Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing va...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Minjie, Chang, Yi-Tan, Wu, Chiung-Ting, Parker, Sarah J., Saylor, Georgia, Wang, Yizhi, Yu, Guoqiang, Van Eyk, Jennifer E., Clarke, Robert, Herrington, David M., Wang, Yue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/
https://www.ncbi.nlm.nih.gov/pubmed/35058491
http://dx.doi.org/10.1038/s41598-022-04938-0
_version_ 1784636928619970560
author Shen, Minjie
Chang, Yi-Tan
Wu, Chiung-Ting
Parker, Sarah J.
Saylor, Georgia
Wang, Yizhi
Yu, Guoqiang
Van Eyk, Jennifer E.
Clarke, Robert
Herrington, David M.
Wang, Yue
author_facet Shen, Minjie
Chang, Yi-Tan
Wu, Chiung-Ting
Parker, Sarah J.
Saylor, Georgia
Wang, Yizhi
Yu, Guoqiang
Van Eyk, Jennifer E.
Clarke, Robert
Herrington, David M.
Wang, Yue
author_sort Shen, Minjie
collection PubMed
description Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.
format Online
Article
Text
id pubmed-8776850
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-87768502022-01-24 Comparative assessment and novel strategy on methods for imputing proteomics data Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue Sci Rep Article Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach. Nature Publishing Group UK 2022-01-20 /pmc/articles/PMC8776850/ /pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Shen, Minjie
Chang, Yi-Tan
Wu, Chiung-Ting
Parker, Sarah J.
Saylor, Georgia
Wang, Yizhi
Yu, Guoqiang
Van Eyk, Jennifer E.
Clarke, Robert
Herrington, David M.
Wang, Yue
Comparative assessment and novel strategy on methods for imputing proteomics data
title Comparative assessment and novel strategy on methods for imputing proteomics data
title_full Comparative assessment and novel strategy on methods for imputing proteomics data
title_fullStr Comparative assessment and novel strategy on methods for imputing proteomics data
title_full_unstemmed Comparative assessment and novel strategy on methods for imputing proteomics data
title_short Comparative assessment and novel strategy on methods for imputing proteomics data
title_sort comparative assessment and novel strategy on methods for imputing proteomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/
https://www.ncbi.nlm.nih.gov/pubmed/35058491
http://dx.doi.org/10.1038/s41598-022-04938-0
work_keys_str_mv AT shenminjie comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT changyitan comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT wuchiungting comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT parkersarahj comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT saylorgeorgia comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT wangyizhi comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT yuguoqiang comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT vaneykjennifere comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT clarkerobert comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT herringtondavidm comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata
AT wangyue comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata