Cargando…

Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning

Identifying cancer-related differentially expressed genes provides significant information for diagnosing tumors, predicting prognoses, and effective treatments. Recently, deep learning methods have been used to perform gene differential expression analysis using microarray-based high-throughput gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Peng, Li, Zhenlei, Hong, Zhaolin, Zheng, Haoran, Zeng, Rong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biophysics Reports Editorial Office 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518520/
https://www.ncbi.nlm.nih.gov/pubmed/37753058
http://dx.doi.org/10.52601/bpr.2023.230005
_version_ 1785109532278521856
author Chen, Peng
Li, Zhenlei
Hong, Zhaolin
Zheng, Haoran
Zeng, Rong
author_facet Chen, Peng
Li, Zhenlei
Hong, Zhaolin
Zheng, Haoran
Zeng, Rong
author_sort Chen, Peng
collection PubMed
description Identifying cancer-related differentially expressed genes provides significant information for diagnosing tumors, predicting prognoses, and effective treatments. Recently, deep learning methods have been used to perform gene differential expression analysis using microarray-based high-throughput gene profiling and have achieved good results. In this study, we proposed a new robust multiple-datasets-based semi-supervised learning model, MSSL, to perform tumor type classification and candidate cancer-specific biomarkers discovery across multiple tumor types and multiple datasets, which addressed the following long-lasting obstacles: (1) the data volume of the existing single dataset is not enough to fully exert the advantages of deep learning; (2) a large number of datasets from different research institutions cannot be effectively used due to inconsistent internal variances and low quality; (3) relatively uncommon cancers have limited effects on deep learning methods. In our article, we applied MSSL to The Cancer Genome Atlas (TCGA) and the Gene Expression Comprehensive Database (GEO) pan-cancer normalized-level3 RNA-seq data and got 97.6% final classification accuracy, which had a significant performance leap compared with previous approaches. Finally, we got the ranking of the importance of the corresponding genes for each cancer type based on classification results and validated that the top genes selected in this way were biologically meaningful for corresponding tumors and some of them had been used as biomarkers, which showed the efficacy of our method.
format Online
Article
Text
id pubmed-10518520
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Biophysics Reports Editorial Office
record_format MEDLINE/PubMed
spelling pubmed-105185202023-09-26 Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning Chen, Peng Li, Zhenlei Hong, Zhaolin Zheng, Haoran Zeng, Rong Biophys Rep Method Identifying cancer-related differentially expressed genes provides significant information for diagnosing tumors, predicting prognoses, and effective treatments. Recently, deep learning methods have been used to perform gene differential expression analysis using microarray-based high-throughput gene profiling and have achieved good results. In this study, we proposed a new robust multiple-datasets-based semi-supervised learning model, MSSL, to perform tumor type classification and candidate cancer-specific biomarkers discovery across multiple tumor types and multiple datasets, which addressed the following long-lasting obstacles: (1) the data volume of the existing single dataset is not enough to fully exert the advantages of deep learning; (2) a large number of datasets from different research institutions cannot be effectively used due to inconsistent internal variances and low quality; (3) relatively uncommon cancers have limited effects on deep learning methods. In our article, we applied MSSL to The Cancer Genome Atlas (TCGA) and the Gene Expression Comprehensive Database (GEO) pan-cancer normalized-level3 RNA-seq data and got 97.6% final classification accuracy, which had a significant performance leap compared with previous approaches. Finally, we got the ranking of the importance of the corresponding genes for each cancer type based on classification results and validated that the top genes selected in this way were biologically meaningful for corresponding tumors and some of them had been used as biomarkers, which showed the efficacy of our method. Biophysics Reports Editorial Office 2023-04-30 /pmc/articles/PMC10518520/ /pubmed/37753058 http://dx.doi.org/10.52601/bpr.2023.230005 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Method
Chen, Peng
Li, Zhenlei
Hong, Zhaolin
Zheng, Haoran
Zeng, Rong
Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
title Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
title_full Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
title_fullStr Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
title_full_unstemmed Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
title_short Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
title_sort tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518520/
https://www.ncbi.nlm.nih.gov/pubmed/37753058
http://dx.doi.org/10.52601/bpr.2023.230005
work_keys_str_mv AT chenpeng tumortypeclassificationandcandidatecancerspecificbiomarkersdiscoveryviasemisupervisedlearning
AT lizhenlei tumortypeclassificationandcandidatecancerspecificbiomarkersdiscoveryviasemisupervisedlearning
AT hongzhaolin tumortypeclassificationandcandidatecancerspecificbiomarkersdiscoveryviasemisupervisedlearning
AT zhenghaoran tumortypeclassificationandcandidatecancerspecificbiomarkersdiscoveryviasemisupervisedlearning
AT zengrong tumortypeclassificationandcandidatecancerspecificbiomarkersdiscoveryviasemisupervisedlearning