Cargando…
Comparative assessment and novel strategy on methods for imputing proteomics data
Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing va...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/ https://www.ncbi.nlm.nih.gov/pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0 |
_version_ | 1784636928619970560 |
---|---|
author | Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue |
author_facet | Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue |
author_sort | Shen, Minjie |
collection | PubMed |
description | Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach. |
format | Online Article Text |
id | pubmed-8776850 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-87768502022-01-24 Comparative assessment and novel strategy on methods for imputing proteomics data Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue Sci Rep Article Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach. Nature Publishing Group UK 2022-01-20 /pmc/articles/PMC8776850/ /pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Shen, Minjie Chang, Yi-Tan Wu, Chiung-Ting Parker, Sarah J. Saylor, Georgia Wang, Yizhi Yu, Guoqiang Van Eyk, Jennifer E. Clarke, Robert Herrington, David M. Wang, Yue Comparative assessment and novel strategy on methods for imputing proteomics data |
title | Comparative assessment and novel strategy on methods for imputing proteomics data |
title_full | Comparative assessment and novel strategy on methods for imputing proteomics data |
title_fullStr | Comparative assessment and novel strategy on methods for imputing proteomics data |
title_full_unstemmed | Comparative assessment and novel strategy on methods for imputing proteomics data |
title_short | Comparative assessment and novel strategy on methods for imputing proteomics data |
title_sort | comparative assessment and novel strategy on methods for imputing proteomics data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/ https://www.ncbi.nlm.nih.gov/pubmed/35058491 http://dx.doi.org/10.1038/s41598-022-04938-0 |
work_keys_str_mv | AT shenminjie comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT changyitan comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT wuchiungting comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT parkersarahj comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT saylorgeorgia comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT wangyizhi comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT yuguoqiang comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT vaneykjennifere comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT clarkerobert comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT herringtondavidm comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata AT wangyue comparativeassessmentandnovelstrategyonmethodsforimputingproteomicsdata |