Cargando…
Deep learning approach for cancer subtype classification using high-dimensional gene expression data
MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. How...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575247/ https://www.ncbi.nlm.nih.gov/pubmed/36253710 http://dx.doi.org/10.1186/s12859-022-04980-9 |
_version_ | 1784811276858294272 |
---|---|
author | Shen, Jiquan Shi, Jiawei Luo, Junwei Zhai, Haixia Liu, Xiaoyan Wu, Zhengjiang Yan, Chaokun Luo, Huimin |
author_facet | Shen, Jiquan Shi, Jiawei Luo, Junwei Zhai, Haixia Liu, Xiaoyan Wu, Zhengjiang Yan, Chaokun Luo, Huimin |
author_sort | Shen, Jiquan |
collection | PubMed |
description | MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04980-9. |
format | Online Article Text |
id | pubmed-9575247 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95752472022-10-18 Deep learning approach for cancer subtype classification using high-dimensional gene expression data Shen, Jiquan Shi, Jiawei Luo, Junwei Zhai, Haixia Liu, Xiaoyan Wu, Zhengjiang Yan, Chaokun Luo, Huimin BMC Bioinformatics Research MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04980-9. BioMed Central 2022-10-17 /pmc/articles/PMC9575247/ /pubmed/36253710 http://dx.doi.org/10.1186/s12859-022-04980-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Shen, Jiquan Shi, Jiawei Luo, Junwei Zhai, Haixia Liu, Xiaoyan Wu, Zhengjiang Yan, Chaokun Luo, Huimin Deep learning approach for cancer subtype classification using high-dimensional gene expression data |
title | Deep learning approach for cancer subtype classification using high-dimensional gene expression data |
title_full | Deep learning approach for cancer subtype classification using high-dimensional gene expression data |
title_fullStr | Deep learning approach for cancer subtype classification using high-dimensional gene expression data |
title_full_unstemmed | Deep learning approach for cancer subtype classification using high-dimensional gene expression data |
title_short | Deep learning approach for cancer subtype classification using high-dimensional gene expression data |
title_sort | deep learning approach for cancer subtype classification using high-dimensional gene expression data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575247/ https://www.ncbi.nlm.nih.gov/pubmed/36253710 http://dx.doi.org/10.1186/s12859-022-04980-9 |
work_keys_str_mv | AT shenjiquan deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT shijiawei deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT luojunwei deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT zhaihaixia deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT liuxiaoyan deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT wuzhengjiang deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT yanchaokun deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata AT luohuimin deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata |