Cargando…
Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis
Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individua...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9290103/ https://www.ncbi.nlm.nih.gov/pubmed/35860346 http://dx.doi.org/10.1177/11769351221105776 |
_version_ | 1784748812341870592 |
---|---|
author | Rincourt, Sarah-Laure Michiels, Stefan Drubay, Damien |
author_facet | Rincourt, Sarah-Laure Michiels, Stefan Drubay, Damien |
author_sort | Rincourt, Sarah-Laure |
collection | PubMed |
description | Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individual gene expression as a combination of unobserved molecular mechanisms (molecular components) that may differ between the individuals. Considering a baseline molecular profile common to all individuals, these molecular components may represent molecular pathways differing from the population background. We defined an infinite sparse graphical independent component analysis (isgICA) to identify these molecular components. This model relies on double sparseness: the source matrix sparseness defines the subset of genes involved in each molecular component, whereas the weight matrix sparseness identifies the subset of molecular components associated with each patient. As the number of molecular components is unknown but likely high, we simultaneously inferred it and the weight matrix sparseness using the beta-Bernoulli process (BBP). We simulated data from a double sparse ICA with 10/30 components with specific sparseness structures for 100/500 individuals and 500/1000/5000 genes with different noise variance levels to evaluate the reconstruction of the latent structures by our model. For all simulations, the isgICA was able to reconstruct with higher accuracy than 2 state-of-the-art methods (ica and fastICA) the number of components, the weight and source matrix sparsenesses (correlation simulated/estimated >.8). Applying our model to the expression of 1063 genes of 614 breast cancer patients, the isgICA identified 22 components. According to the source matrix, 7 of these 22 components seemed to be specifically related to 3 known molecular pathways with a prognostic effect in early breast cancer (immune system, proliferation, and stroma invasion). This proposed algorithm provides an insight into individual molecular heterogeneity to better understand complex disease mechanisms. |
format | Online Article Text |
id | pubmed-9290103 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-92901032022-07-19 Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis Rincourt, Sarah-Laure Michiels, Stefan Drubay, Damien Cancer Inform Original Research Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individual gene expression as a combination of unobserved molecular mechanisms (molecular components) that may differ between the individuals. Considering a baseline molecular profile common to all individuals, these molecular components may represent molecular pathways differing from the population background. We defined an infinite sparse graphical independent component analysis (isgICA) to identify these molecular components. This model relies on double sparseness: the source matrix sparseness defines the subset of genes involved in each molecular component, whereas the weight matrix sparseness identifies the subset of molecular components associated with each patient. As the number of molecular components is unknown but likely high, we simultaneously inferred it and the weight matrix sparseness using the beta-Bernoulli process (BBP). We simulated data from a double sparse ICA with 10/30 components with specific sparseness structures for 100/500 individuals and 500/1000/5000 genes with different noise variance levels to evaluate the reconstruction of the latent structures by our model. For all simulations, the isgICA was able to reconstruct with higher accuracy than 2 state-of-the-art methods (ica and fastICA) the number of components, the weight and source matrix sparsenesses (correlation simulated/estimated >.8). Applying our model to the expression of 1063 genes of 614 breast cancer patients, the isgICA identified 22 components. According to the source matrix, 7 of these 22 components seemed to be specifically related to 3 known molecular pathways with a prognostic effect in early breast cancer (immune system, proliferation, and stroma invasion). This proposed algorithm provides an insight into individual molecular heterogeneity to better understand complex disease mechanisms. SAGE Publications 2022-07-15 /pmc/articles/PMC9290103/ /pubmed/35860346 http://dx.doi.org/10.1177/11769351221105776 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Rincourt, Sarah-Laure Michiels, Stefan Drubay, Damien Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis |
title | Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis |
title_full | Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis |
title_fullStr | Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis |
title_full_unstemmed | Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis |
title_short | Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis |
title_sort | complex disease individual molecular characterization using infinite sparse graphical independent component analysis |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9290103/ https://www.ncbi.nlm.nih.gov/pubmed/35860346 http://dx.doi.org/10.1177/11769351221105776 |
work_keys_str_mv | AT rincourtsarahlaure complexdiseaseindividualmolecularcharacterizationusinginfinitesparsegraphicalindependentcomponentanalysis AT michielsstefan complexdiseaseindividualmolecularcharacterizationusinginfinitesparsegraphicalindependentcomponentanalysis AT drubaydamien complexdiseaseindividualmolecularcharacterizationusinginfinitesparsegraphicalindependentcomponentanalysis |