Cargando…

Deep learning approach for cancer subtype classification using high-dimensional gene expression data

MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Jiquan, Shi, Jiawei, Luo, Junwei, Zhai, Haixia, Liu, Xiaoyan, Wu, Zhengjiang, Yan, Chaokun, Luo, Huimin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575247/
https://www.ncbi.nlm.nih.gov/pubmed/36253710
http://dx.doi.org/10.1186/s12859-022-04980-9
_version_ 1784811276858294272
author Shen, Jiquan
Shi, Jiawei
Luo, Junwei
Zhai, Haixia
Liu, Xiaoyan
Wu, Zhengjiang
Yan, Chaokun
Luo, Huimin
author_facet Shen, Jiquan
Shi, Jiawei
Luo, Junwei
Zhai, Haixia
Liu, Xiaoyan
Wu, Zhengjiang
Yan, Chaokun
Luo, Huimin
author_sort Shen, Jiquan
collection PubMed
description MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04980-9.
format Online
Article
Text
id pubmed-9575247
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95752472022-10-18 Deep learning approach for cancer subtype classification using high-dimensional gene expression data Shen, Jiquan Shi, Jiawei Luo, Junwei Zhai, Haixia Liu, Xiaoyan Wu, Zhengjiang Yan, Chaokun Luo, Huimin BMC Bioinformatics Research MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04980-9. BioMed Central 2022-10-17 /pmc/articles/PMC9575247/ /pubmed/36253710 http://dx.doi.org/10.1186/s12859-022-04980-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Shen, Jiquan
Shi, Jiawei
Luo, Junwei
Zhai, Haixia
Liu, Xiaoyan
Wu, Zhengjiang
Yan, Chaokun
Luo, Huimin
Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_full Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_fullStr Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_full_unstemmed Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_short Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_sort deep learning approach for cancer subtype classification using high-dimensional gene expression data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575247/
https://www.ncbi.nlm.nih.gov/pubmed/36253710
http://dx.doi.org/10.1186/s12859-022-04980-9
work_keys_str_mv AT shenjiquan deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT shijiawei deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT luojunwei deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT zhaihaixia deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT liuxiaoyan deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT wuzhengjiang deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT yanchaokun deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT luohuimin deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata