Cargando…

Multiomics Topic Modeling for Breast Cancer Classification

SIMPLE SUMMARY: Topic models are algorithms introduced for discovering hidden topics or latent variables in large, unstructured text corpora. Leveraging on analogies between texts and gene expression profiles, these algorithms can be used to find structures in expression data. This work presents an...

Descripción completa

Detalles Bibliográficos
Autores principales: Valle, Filippo, Osella, Matteo, Caselle, Michele
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8909787/
https://www.ncbi.nlm.nih.gov/pubmed/35267458
http://dx.doi.org/10.3390/cancers14051150
_version_ 1784666274732703744
author Valle, Filippo
Osella, Matteo
Caselle, Michele
author_facet Valle, Filippo
Osella, Matteo
Caselle, Michele
author_sort Valle, Filippo
collection PubMed
description SIMPLE SUMMARY: Topic models are algorithms introduced for discovering hidden topics or latent variables in large, unstructured text corpora. Leveraging on analogies between texts and gene expression profiles, these algorithms can be used to find structures in expression data. This work presents an application of topic modeling techniques for the identification of breast cancer subtypes. In particular, we extended a specific class of topic models to allow a multiomics approach. As an illustrative example, considering both messenger RNA and microRNA expression levels, we were able to clearly distinguish healthy from tumor samples as well as the different breast cancer subtypes. The integration of different layers of information is crucial for the observed classification accuracy. Our approach naturally provides the genes and the microRNAs associated to the specific topics that are used for sample organization. We show that indeed these topics often contain genes involved in breast cancer development and are associated to different survival probabilities. ABSTRACT: The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or “topics” that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.
format Online
Article
Text
id pubmed-8909787
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-89097872022-03-11 Multiomics Topic Modeling for Breast Cancer Classification Valle, Filippo Osella, Matteo Caselle, Michele Cancers (Basel) Article SIMPLE SUMMARY: Topic models are algorithms introduced for discovering hidden topics or latent variables in large, unstructured text corpora. Leveraging on analogies between texts and gene expression profiles, these algorithms can be used to find structures in expression data. This work presents an application of topic modeling techniques for the identification of breast cancer subtypes. In particular, we extended a specific class of topic models to allow a multiomics approach. As an illustrative example, considering both messenger RNA and microRNA expression levels, we were able to clearly distinguish healthy from tumor samples as well as the different breast cancer subtypes. The integration of different layers of information is crucial for the observed classification accuracy. Our approach naturally provides the genes and the microRNAs associated to the specific topics that are used for sample organization. We show that indeed these topics often contain genes involved in breast cancer development and are associated to different survival probabilities. ABSTRACT: The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or “topics” that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability. MDPI 2022-02-23 /pmc/articles/PMC8909787/ /pubmed/35267458 http://dx.doi.org/10.3390/cancers14051150 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Valle, Filippo
Osella, Matteo
Caselle, Michele
Multiomics Topic Modeling for Breast Cancer Classification
title Multiomics Topic Modeling for Breast Cancer Classification
title_full Multiomics Topic Modeling for Breast Cancer Classification
title_fullStr Multiomics Topic Modeling for Breast Cancer Classification
title_full_unstemmed Multiomics Topic Modeling for Breast Cancer Classification
title_short Multiomics Topic Modeling for Breast Cancer Classification
title_sort multiomics topic modeling for breast cancer classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8909787/
https://www.ncbi.nlm.nih.gov/pubmed/35267458
http://dx.doi.org/10.3390/cancers14051150
work_keys_str_mv AT vallefilippo multiomicstopicmodelingforbreastcancerclassification
AT osellamatteo multiomicstopicmodelingforbreastcancerclassification
AT casellemichele multiomicstopicmodelingforbreastcancerclassification