Cargando…

Gene expression data classification using topology and machine learning models

BACKGROUND: Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dea...

Descripción completa

Detalles Bibliográficos
Autores principales: Dey, Tamal K., Mandal, Sayan, Mukherjee, Soham
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9121583/
https://www.ncbi.nlm.nih.gov/pubmed/35596135
http://dx.doi.org/10.1186/s12859-022-04704-z
_version_ 1784711183062794240
author Dey, Tamal K.
Mandal, Sayan
Mukherjee, Soham
author_facet Dey, Tamal K.
Mandal, Sayan
Mukherjee, Soham
author_sort Dey, Tamal K.
collection PubMed
description BACKGROUND: Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. RESULTS: The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. CONCLUSIONS: In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
format Online
Article
Text
id pubmed-9121583
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91215832022-05-21 Gene expression data classification using topology and machine learning models Dey, Tamal K. Mandal, Sayan Mukherjee, Soham BMC Bioinformatics Research BACKGROUND: Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. RESULTS: The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. CONCLUSIONS: In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes. BioMed Central 2022-05-20 /pmc/articles/PMC9121583/ /pubmed/35596135 http://dx.doi.org/10.1186/s12859-022-04704-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dey, Tamal K.
Mandal, Sayan
Mukherjee, Soham
Gene expression data classification using topology and machine learning models
title Gene expression data classification using topology and machine learning models
title_full Gene expression data classification using topology and machine learning models
title_fullStr Gene expression data classification using topology and machine learning models
title_full_unstemmed Gene expression data classification using topology and machine learning models
title_short Gene expression data classification using topology and machine learning models
title_sort gene expression data classification using topology and machine learning models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9121583/
https://www.ncbi.nlm.nih.gov/pubmed/35596135
http://dx.doi.org/10.1186/s12859-022-04704-z
work_keys_str_mv AT deytamalk geneexpressiondataclassificationusingtopologyandmachinelearningmodels
AT mandalsayan geneexpressiondataclassificationusingtopologyandmachinelearningmodels
AT mukherjeesoham geneexpressiondataclassificationusingtopologyandmachinelearningmodels