Cargando…
Multiomics Topic Modeling for Breast Cancer Classification
SIMPLE SUMMARY: Topic models are algorithms introduced for discovering hidden topics or latent variables in large, unstructured text corpora. Leveraging on analogies between texts and gene expression profiles, these algorithms can be used to find structures in expression data. This work presents an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8909787/ https://www.ncbi.nlm.nih.gov/pubmed/35267458 http://dx.doi.org/10.3390/cancers14051150 |
_version_ | 1784666274732703744 |
---|---|
author | Valle, Filippo Osella, Matteo Caselle, Michele |
author_facet | Valle, Filippo Osella, Matteo Caselle, Michele |
author_sort | Valle, Filippo |
collection | PubMed |
description | SIMPLE SUMMARY: Topic models are algorithms introduced for discovering hidden topics or latent variables in large, unstructured text corpora. Leveraging on analogies between texts and gene expression profiles, these algorithms can be used to find structures in expression data. This work presents an application of topic modeling techniques for the identification of breast cancer subtypes. In particular, we extended a specific class of topic models to allow a multiomics approach. As an illustrative example, considering both messenger RNA and microRNA expression levels, we were able to clearly distinguish healthy from tumor samples as well as the different breast cancer subtypes. The integration of different layers of information is crucial for the observed classification accuracy. Our approach naturally provides the genes and the microRNAs associated to the specific topics that are used for sample organization. We show that indeed these topics often contain genes involved in breast cancer development and are associated to different survival probabilities. ABSTRACT: The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or “topics” that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability. |
format | Online Article Text |
id | pubmed-8909787 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-89097872022-03-11 Multiomics Topic Modeling for Breast Cancer Classification Valle, Filippo Osella, Matteo Caselle, Michele Cancers (Basel) Article SIMPLE SUMMARY: Topic models are algorithms introduced for discovering hidden topics or latent variables in large, unstructured text corpora. Leveraging on analogies between texts and gene expression profiles, these algorithms can be used to find structures in expression data. This work presents an application of topic modeling techniques for the identification of breast cancer subtypes. In particular, we extended a specific class of topic models to allow a multiomics approach. As an illustrative example, considering both messenger RNA and microRNA expression levels, we were able to clearly distinguish healthy from tumor samples as well as the different breast cancer subtypes. The integration of different layers of information is crucial for the observed classification accuracy. Our approach naturally provides the genes and the microRNAs associated to the specific topics that are used for sample organization. We show that indeed these topics often contain genes involved in breast cancer development and are associated to different survival probabilities. ABSTRACT: The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or “topics” that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability. MDPI 2022-02-23 /pmc/articles/PMC8909787/ /pubmed/35267458 http://dx.doi.org/10.3390/cancers14051150 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Valle, Filippo Osella, Matteo Caselle, Michele Multiomics Topic Modeling for Breast Cancer Classification |
title | Multiomics Topic Modeling for Breast Cancer Classification |
title_full | Multiomics Topic Modeling for Breast Cancer Classification |
title_fullStr | Multiomics Topic Modeling for Breast Cancer Classification |
title_full_unstemmed | Multiomics Topic Modeling for Breast Cancer Classification |
title_short | Multiomics Topic Modeling for Breast Cancer Classification |
title_sort | multiomics topic modeling for breast cancer classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8909787/ https://www.ncbi.nlm.nih.gov/pubmed/35267458 http://dx.doi.org/10.3390/cancers14051150 |
work_keys_str_mv | AT vallefilippo multiomicstopicmodelingforbreastcancerclassification AT osellamatteo multiomicstopicmodelingforbreastcancerclassification AT casellemichele multiomicstopicmodelingforbreastcancerclassification |