Cargando…

Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses

Variable selection is crucial in high‐dimensional omics‐based analyses, since it is biologically reasonable to assume only a subset of non‐noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is s...

Descripción completa

Detalles Bibliográficos
Autores principales: Eliseussen, Emilie, Fleischer, Thomas, Vitelli, Valeria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9796757/
https://www.ncbi.nlm.nih.gov/pubmed/35844145
http://dx.doi.org/10.1002/sim.9524
_version_ 1784860559312683008
author Eliseussen, Emilie
Fleischer, Thomas
Vitelli, Valeria
author_facet Eliseussen, Emilie
Fleischer, Thomas
Vitelli, Valeria
author_sort Eliseussen, Emilie
collection PubMed
description Variable selection is crucial in high‐dimensional omics‐based analyses, since it is biologically reasonable to assume only a subset of non‐noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank‐based unsupervised transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. Simulation studies demonstrate the versatility and robustness of the proposed method in a variety of scenarios, as well as its superiority with respect to several competitors when varying the data dimension or data generating process. We use the novel approach to analyze genome‐wide RNAseq gene expression data from ovarian cancer patients: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the usefulness of the method in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.
format Online
Article
Text
id pubmed-9796757
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-97967572023-01-04 Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses Eliseussen, Emilie Fleischer, Thomas Vitelli, Valeria Stat Med Research Articles Variable selection is crucial in high‐dimensional omics‐based analyses, since it is biologically reasonable to assume only a subset of non‐noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank‐based unsupervised transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. Simulation studies demonstrate the versatility and robustness of the proposed method in a variety of scenarios, as well as its superiority with respect to several competitors when varying the data dimension or data generating process. We use the novel approach to analyze genome‐wide RNAseq gene expression data from ovarian cancer patients: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the usefulness of the method in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation. John Wiley & Sons, Inc. 2022-07-18 2022-10-15 /pmc/articles/PMC9796757/ /pubmed/35844145 http://dx.doi.org/10.1002/sim.9524 Text en © 2022 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Eliseussen, Emilie
Fleischer, Thomas
Vitelli, Valeria
Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses
title Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses
title_full Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses
title_fullStr Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses
title_full_unstemmed Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses
title_short Rank‐based Bayesian variable selection for genome‐wide transcriptomic analyses
title_sort rank‐based bayesian variable selection for genome‐wide transcriptomic analyses
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9796757/
https://www.ncbi.nlm.nih.gov/pubmed/35844145
http://dx.doi.org/10.1002/sim.9524
work_keys_str_mv AT eliseussenemilie rankbasedbayesianvariableselectionforgenomewidetranscriptomicanalyses
AT fleischerthomas rankbasedbayesianvariableselectionforgenomewidetranscriptomicanalyses
AT vitellivaleria rankbasedbayesianvariableselectionforgenomewidetranscriptomicanalyses