Cargando…

Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics

MOTIVATION: Mass spectrometry proteomics is a powerful tool in biomedical research but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified (detected) for particular samples. Many analysis strategies have been proposed for missing valu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Mengbo, Smyth, Gordon K
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174703/ https://www.ncbi.nlm.nih.gov/pubmed/37067487 http://dx.doi.org/10.1093/bioinformatics/btad200

_version_	1785040091558707200
author	Li, Mengbo Smyth, Gordon K
author_facet	Li, Mengbo Smyth, Gordon K
author_sort	Li, Mengbo
collection	PubMed
description	MOTIVATION: Mass spectrometry proteomics is a powerful tool in biomedical research but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified (detected) for particular samples. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). RESULTS: Statistical models and algorithms are proposed for estimating the detection probabilities and for evaluating how much statistical information can or cannot be recovered from the missing value pattern. The probability that an intensity is detected is shown to be accurately modeled as a logit-linear function of the underlying intensity, showing that missing value process is intermediate between MAR and censoring. The detection probability asymptotes to 100% for high intensities, showing that missing values unrelated to intensity are rare. The rule applies globally to each dataset and is appropriate for both high and lowly expressed peptides. A probability model is developed that allows the distribution of unobserved intensities to be inferred from the observed values. The detection probability model is incorporated into a likelihood-based approach for assessing differential expression and successfully recovers statistical power compared to omitting the missing values from the analysis. In contrast, imputation methods are shown to perform poorly, either reducing statistical power or increasing the false discovery rate to unacceptable levels. AVAILABILITY AND IMPLEMENTATION: Data and code to reproduce the results shown in this article are available from https://mengbo-li.github.io/protDP/.
format	Online Article Text
id	pubmed-10174703
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-101747032023-05-12 Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics Li, Mengbo Smyth, Gordon K Bioinformatics Original Paper MOTIVATION: Mass spectrometry proteomics is a powerful tool in biomedical research but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified (detected) for particular samples. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). RESULTS: Statistical models and algorithms are proposed for estimating the detection probabilities and for evaluating how much statistical information can or cannot be recovered from the missing value pattern. The probability that an intensity is detected is shown to be accurately modeled as a logit-linear function of the underlying intensity, showing that missing value process is intermediate between MAR and censoring. The detection probability asymptotes to 100% for high intensities, showing that missing values unrelated to intensity are rare. The rule applies globally to each dataset and is appropriate for both high and lowly expressed peptides. A probability model is developed that allows the distribution of unobserved intensities to be inferred from the observed values. The detection probability model is incorporated into a likelihood-based approach for assessing differential expression and successfully recovers statistical power compared to omitting the missing values from the analysis. In contrast, imputation methods are shown to perform poorly, either reducing statistical power or increasing the false discovery rate to unacceptable levels. AVAILABILITY AND IMPLEMENTATION: Data and code to reproduce the results shown in this article are available from https://mengbo-li.github.io/protDP/. Oxford University Press 2023-04-17 /pmc/articles/PMC10174703/ /pubmed/37067487 http://dx.doi.org/10.1093/bioinformatics/btad200 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Li, Mengbo Smyth, Gordon K Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
title	Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
title_full	Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
title_fullStr	Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
title_full_unstemmed	Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
title_short	Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
title_sort	neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174703/ https://www.ncbi.nlm.nih.gov/pubmed/37067487 http://dx.doi.org/10.1093/bioinformatics/btad200
work_keys_str_mv	AT limengbo neitherrandomnorcensoredestimatingintensitydependentprobabilitiesformissingvaluesinlabelfreeproteomics AT smythgordonk neitherrandomnorcensoredestimatingintensitydependentprobabilitiesformissingvaluesinlabelfreeproteomics

Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics

Ejemplares similares