Cargando…

Bayesian inference for biomarker discovery in proteomics: an analytic solution

This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathologic...

Descripción completa

Detalles Bibliográficos
Autores principales: Dridi, Noura, Giremus, Audrey, Giovannelli, Jean-Francois, Truntzer, Caroline, Hadzagic, Melita, Charrier, Jean-Philippe, Gerfault, Laurent, Ducoroy, Patrick, Lacroix, Bruno, Grangeat, Pierre, Roy, Pascal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5511129/
https://www.ncbi.nlm.nih.gov/pubmed/28710702
http://dx.doi.org/10.1186/s13637-017-0062-4
_version_ 1783250275446816768
author Dridi, Noura
Giremus, Audrey
Giovannelli, Jean-Francois
Truntzer, Caroline
Hadzagic, Melita
Charrier, Jean-Philippe
Gerfault, Laurent
Ducoroy, Patrick
Lacroix, Bruno
Grangeat, Pierre
Roy, Pascal
author_facet Dridi, Noura
Giremus, Audrey
Giovannelli, Jean-Francois
Truntzer, Caroline
Hadzagic, Melita
Charrier, Jean-Philippe
Gerfault, Laurent
Ducoroy, Patrick
Lacroix, Bruno
Grangeat, Pierre
Roy, Pascal
author_sort Dridi, Noura
collection PubMed
description This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode.
format Online
Article
Text
id pubmed-5511129
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-55111292017-07-31 Bayesian inference for biomarker discovery in proteomics: an analytic solution Dridi, Noura Giremus, Audrey Giovannelli, Jean-Francois Truntzer, Caroline Hadzagic, Melita Charrier, Jean-Philippe Gerfault, Laurent Ducoroy, Patrick Lacroix, Bruno Grangeat, Pierre Roy, Pascal EURASIP J Bioinform Syst Biol Research This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode. Springer International Publishing 2017-07-14 /pmc/articles/PMC5511129/ /pubmed/28710702 http://dx.doi.org/10.1186/s13637-017-0062-4 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Dridi, Noura
Giremus, Audrey
Giovannelli, Jean-Francois
Truntzer, Caroline
Hadzagic, Melita
Charrier, Jean-Philippe
Gerfault, Laurent
Ducoroy, Patrick
Lacroix, Bruno
Grangeat, Pierre
Roy, Pascal
Bayesian inference for biomarker discovery in proteomics: an analytic solution
title Bayesian inference for biomarker discovery in proteomics: an analytic solution
title_full Bayesian inference for biomarker discovery in proteomics: an analytic solution
title_fullStr Bayesian inference for biomarker discovery in proteomics: an analytic solution
title_full_unstemmed Bayesian inference for biomarker discovery in proteomics: an analytic solution
title_short Bayesian inference for biomarker discovery in proteomics: an analytic solution
title_sort bayesian inference for biomarker discovery in proteomics: an analytic solution
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5511129/
https://www.ncbi.nlm.nih.gov/pubmed/28710702
http://dx.doi.org/10.1186/s13637-017-0062-4
work_keys_str_mv AT dridinoura bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT giremusaudrey bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT giovannellijeanfrancois bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT truntzercaroline bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT hadzagicmelita bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT charrierjeanphilippe bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT gerfaultlaurent bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT ducoroypatrick bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT lacroixbruno bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT grangeatpierre bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution
AT roypascal bayesianinferenceforbiomarkerdiscoveryinproteomicsananalyticsolution