Cargando…

Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer

Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature selection...

Descripción completa

Detalles Bibliográficos
Autores principales: Maurya, Neha Shree, Kushwaha, Sandeep, Chawade, Aakash, Mani, Ashutosh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275802/
https://www.ncbi.nlm.nih.gov/pubmed/34253750
http://dx.doi.org/10.1038/s41598-021-92692-0
_version_ 1783721783574134784
author Maurya, Neha Shree
Kushwaha, Sandeep
Chawade, Aakash
Mani, Ashutosh
author_facet Maurya, Neha Shree
Kushwaha, Sandeep
Chawade, Aakash
Mani, Ashutosh
author_sort Maurya, Neha Shree
collection PubMed
description Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature selection techniques Least absolute shrinkage and selection operator (LASSO) and Relief were used along with class balancing for obtaining features (genes) of high importance. The classification of the CRC dataset was done by ML algorithms namely, random forest (RF), K-nearest neighbour (KNN), and artificial neural networks (ANN). The significant DEGs were 2933, having 1832 upregulated and 1101 downregulated genes. The CRC gene expression dataset had 23,186 features. LASSO had performed better than Relief for classifying tumor and normal samples through ML algorithms namely RF, KNN, and ANN with an accuracy of 100%, while Relief had given 79.5%, 85.05%, and 100% respectively. Common features between LASSO and DEGs were 38, from them only 5 common genes namely, VSTM2A, NR5A2, TMEM236, GDLN, and ETFDH had shown statistically significant survival analysis. Functional review and analysis of the selected genes helped in downsizing the 5 genes to 2, which are VSTM2A and TMEM236. Differential expression of TMEM236 was statistically significant and was markedly reduced in the dataset which solicits appreciation for assessment as a novel biomarker for CRC diagnosis.
format Online
Article
Text
id pubmed-8275802
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82758022021-07-13 Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer Maurya, Neha Shree Kushwaha, Sandeep Chawade, Aakash Mani, Ashutosh Sci Rep Article Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature selection techniques Least absolute shrinkage and selection operator (LASSO) and Relief were used along with class balancing for obtaining features (genes) of high importance. The classification of the CRC dataset was done by ML algorithms namely, random forest (RF), K-nearest neighbour (KNN), and artificial neural networks (ANN). The significant DEGs were 2933, having 1832 upregulated and 1101 downregulated genes. The CRC gene expression dataset had 23,186 features. LASSO had performed better than Relief for classifying tumor and normal samples through ML algorithms namely RF, KNN, and ANN with an accuracy of 100%, while Relief had given 79.5%, 85.05%, and 100% respectively. Common features between LASSO and DEGs were 38, from them only 5 common genes namely, VSTM2A, NR5A2, TMEM236, GDLN, and ETFDH had shown statistically significant survival analysis. Functional review and analysis of the selected genes helped in downsizing the 5 genes to 2, which are VSTM2A and TMEM236. Differential expression of TMEM236 was statistically significant and was markedly reduced in the dataset which solicits appreciation for assessment as a novel biomarker for CRC diagnosis. Nature Publishing Group UK 2021-07-12 /pmc/articles/PMC8275802/ /pubmed/34253750 http://dx.doi.org/10.1038/s41598-021-92692-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Maurya, Neha Shree
Kushwaha, Sandeep
Chawade, Aakash
Mani, Ashutosh
Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_full Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_fullStr Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_full_unstemmed Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_short Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_sort transcriptome profiling by combined machine learning and statistical r analysis identifies tmem236 as a potential novel diagnostic biomarker for colorectal cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275802/
https://www.ncbi.nlm.nih.gov/pubmed/34253750
http://dx.doi.org/10.1038/s41598-021-92692-0
work_keys_str_mv AT mauryanehashree transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
AT kushwahasandeep transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
AT chawadeaakash transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
AT maniashutosh transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer