Cargando…

Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with Gamma Regressing Mean-Variance Trends

Label-free proteomics is a fast-growing methodology to infer abundances in mass spectrometry proteomics. Extensive research has focused on spectral quantification and peptide identification. However, research toward modeling and understanding quantitative proteomics data is scarce. Here we propose a...

Descripción completa

Detalles Bibliográficos
Autores principales: Berg, Philip, Popescu, George
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Biochemistry and Molecular Biology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10687340/
https://www.ncbi.nlm.nih.gov/pubmed/37806340
http://dx.doi.org/10.1016/j.mcpro.2023.100658
Descripción
Sumario:Label-free proteomics is a fast-growing methodology to infer abundances in mass spectrometry proteomics. Extensive research has focused on spectral quantification and peptide identification. However, research toward modeling and understanding quantitative proteomics data is scarce. Here we propose a Bayesian hierarchical decision model (Baldur) to test for differences in means between conditions for proteins, peptides, and post-translational modifications. We developed a Bayesian regression model to characterize local mean-variance trends in data, to estimate measurement uncertainty and hyperparameters for the decision model. A key contribution is the development of a new gamma regression model that describes the mean-variance dependency as a mixture of a common and a latent trend—allowing for localized trend estimates. We then evaluate the performance of Baldur, limma-trend, and t test on six benchmark datasets: five total proteomics and one post-translational modification dataset. We find that Baldur drastically improves the decision in noisier post-translational modification data over limma-trend and t test. In addition, we see significant improvements using Baldur over the other methods in the total proteomics datasets. Finally, we analyzed Baldur’s performance when increasing the number of replicates and found that the method always increases precision with sample size, while showing robust control of the false positive rate. We conclude that our model vastly improves over popular data analysis methods (limma-trend and t test) in several spike-in datasets by achieving a high true positive detection rate, while greatly reducing the false-positive rate.