Cargando…

Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis

Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individua...

Descripción completa

Detalles Bibliográficos
Autores principales: Rincourt, Sarah-Laure, Michiels, Stefan, Drubay, Damien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9290103/
https://www.ncbi.nlm.nih.gov/pubmed/35860346
http://dx.doi.org/10.1177/11769351221105776
_version_ 1784748812341870592
author Rincourt, Sarah-Laure
Michiels, Stefan
Drubay, Damien
author_facet Rincourt, Sarah-Laure
Michiels, Stefan
Drubay, Damien
author_sort Rincourt, Sarah-Laure
collection PubMed
description Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individual gene expression as a combination of unobserved molecular mechanisms (molecular components) that may differ between the individuals. Considering a baseline molecular profile common to all individuals, these molecular components may represent molecular pathways differing from the population background. We defined an infinite sparse graphical independent component analysis (isgICA) to identify these molecular components. This model relies on double sparseness: the source matrix sparseness defines the subset of genes involved in each molecular component, whereas the weight matrix sparseness identifies the subset of molecular components associated with each patient. As the number of molecular components is unknown but likely high, we simultaneously inferred it and the weight matrix sparseness using the beta-Bernoulli process (BBP). We simulated data from a double sparse ICA with 10/30 components with specific sparseness structures for 100/500 individuals and 500/1000/5000 genes with different noise variance levels to evaluate the reconstruction of the latent structures by our model. For all simulations, the isgICA was able to reconstruct with higher accuracy than 2 state-of-the-art methods (ica and fastICA) the number of components, the weight and source matrix sparsenesses (correlation simulated/estimated >.8). Applying our model to the expression of 1063 genes of 614 breast cancer patients, the isgICA identified 22 components. According to the source matrix, 7 of these 22 components seemed to be specifically related to 3 known molecular pathways with a prognostic effect in early breast cancer (immune system, proliferation, and stroma invasion). This proposed algorithm provides an insight into individual molecular heterogeneity to better understand complex disease mechanisms.
format Online
Article
Text
id pubmed-9290103
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-92901032022-07-19 Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis Rincourt, Sarah-Laure Michiels, Stefan Drubay, Damien Cancer Inform Original Research Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individual gene expression as a combination of unobserved molecular mechanisms (molecular components) that may differ between the individuals. Considering a baseline molecular profile common to all individuals, these molecular components may represent molecular pathways differing from the population background. We defined an infinite sparse graphical independent component analysis (isgICA) to identify these molecular components. This model relies on double sparseness: the source matrix sparseness defines the subset of genes involved in each molecular component, whereas the weight matrix sparseness identifies the subset of molecular components associated with each patient. As the number of molecular components is unknown but likely high, we simultaneously inferred it and the weight matrix sparseness using the beta-Bernoulli process (BBP). We simulated data from a double sparse ICA with 10/30 components with specific sparseness structures for 100/500 individuals and 500/1000/5000 genes with different noise variance levels to evaluate the reconstruction of the latent structures by our model. For all simulations, the isgICA was able to reconstruct with higher accuracy than 2 state-of-the-art methods (ica and fastICA) the number of components, the weight and source matrix sparsenesses (correlation simulated/estimated >.8). Applying our model to the expression of 1063 genes of 614 breast cancer patients, the isgICA identified 22 components. According to the source matrix, 7 of these 22 components seemed to be specifically related to 3 known molecular pathways with a prognostic effect in early breast cancer (immune system, proliferation, and stroma invasion). This proposed algorithm provides an insight into individual molecular heterogeneity to better understand complex disease mechanisms. SAGE Publications 2022-07-15 /pmc/articles/PMC9290103/ /pubmed/35860346 http://dx.doi.org/10.1177/11769351221105776 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Rincourt, Sarah-Laure
Michiels, Stefan
Drubay, Damien
Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
title Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
title_full Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
title_fullStr Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
title_full_unstemmed Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
title_short Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
title_sort complex disease individual molecular characterization using infinite sparse graphical independent component analysis
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9290103/
https://www.ncbi.nlm.nih.gov/pubmed/35860346
http://dx.doi.org/10.1177/11769351221105776
work_keys_str_mv AT rincourtsarahlaure complexdiseaseindividualmolecularcharacterizationusinginfinitesparsegraphicalindependentcomponentanalysis
AT michielsstefan complexdiseaseindividualmolecularcharacterizationusinginfinitesparsegraphicalindependentcomponentanalysis
AT drubaydamien complexdiseaseindividualmolecularcharacterizationusinginfinitesparsegraphicalindependentcomponentanalysis