Cargando…

Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms

Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760...

Descripción completa

Detalles Bibliográficos
Autores principales: Maurya, Neha Shree, Kushwaha, Sandeep, Vetukuri, Ramesh Raju, Mani, Ashutosh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10606805/
https://www.ncbi.nlm.nih.gov/pubmed/37895185
http://dx.doi.org/10.3390/genes14101836
_version_ 1785127403755929600
author Maurya, Neha Shree
Kushwaha, Sandeep
Vetukuri, Ramesh Raju
Mani, Ashutosh
author_facet Maurya, Neha Shree
Kushwaha, Sandeep
Vetukuri, Ramesh Raju
Mani, Ashutosh
author_sort Maurya, Neha Shree
collection PubMed
description Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760 datasets were pre-processed and subjected to feature selection using the LASSO method in combination with five ML algorithms: Adaboost, Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). The important features were further analyzed for gene expression, correlation, and survival analyses. Validation of the external dataset GSE142279 was also performed. The RF model had the best classification accuracy for both datasets. A feature selection process resulted in the identification of 12 candidate genes, which were subsequently reduced to 3 (CA2, CA7, and ITM2C) through gene expression and correlation analyses. These three genes achieved 100% accuracy in an external dataset. The AUC values for these genes were 99.24%, 100%, and 99.5%, respectively. The survival analysis showed a significant logrank p-value of 0.044 for the final gene signatures. The analysis of tumor immunocyte infiltration showed a weak correlation with the expression of the gene signatures. CA2, CA7, and ITM2C can serve as gene signatures for the early detection of CRC and may provide valuable information for prognostic and therapeutic decision making. Further research is needed to fully understand the potential of these genes in the context of CRC.
format Online
Article
Text
id pubmed-10606805
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106068052023-10-28 Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms Maurya, Neha Shree Kushwaha, Sandeep Vetukuri, Ramesh Raju Mani, Ashutosh Genes (Basel) Article Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760 datasets were pre-processed and subjected to feature selection using the LASSO method in combination with five ML algorithms: Adaboost, Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). The important features were further analyzed for gene expression, correlation, and survival analyses. Validation of the external dataset GSE142279 was also performed. The RF model had the best classification accuracy for both datasets. A feature selection process resulted in the identification of 12 candidate genes, which were subsequently reduced to 3 (CA2, CA7, and ITM2C) through gene expression and correlation analyses. These three genes achieved 100% accuracy in an external dataset. The AUC values for these genes were 99.24%, 100%, and 99.5%, respectively. The survival analysis showed a significant logrank p-value of 0.044 for the final gene signatures. The analysis of tumor immunocyte infiltration showed a weak correlation with the expression of the gene signatures. CA2, CA7, and ITM2C can serve as gene signatures for the early detection of CRC and may provide valuable information for prognostic and therapeutic decision making. Further research is needed to fully understand the potential of these genes in the context of CRC. MDPI 2023-09-22 /pmc/articles/PMC10606805/ /pubmed/37895185 http://dx.doi.org/10.3390/genes14101836 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Maurya, Neha Shree
Kushwaha, Sandeep
Vetukuri, Ramesh Raju
Mani, Ashutosh
Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms
title Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms
title_full Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms
title_fullStr Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms
title_full_unstemmed Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms
title_short Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms
title_sort unlocking the potential of the ca2, ca7, and itm2c gene signatures for the early detection of colorectal cancer: a comprehensive analysis of rna-seq data by utilizing machine learning algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10606805/
https://www.ncbi.nlm.nih.gov/pubmed/37895185
http://dx.doi.org/10.3390/genes14101836
work_keys_str_mv AT mauryanehashree unlockingthepotentialoftheca2ca7anditm2cgenesignaturesfortheearlydetectionofcolorectalcanceracomprehensiveanalysisofrnaseqdatabyutilizingmachinelearningalgorithms
AT kushwahasandeep unlockingthepotentialoftheca2ca7anditm2cgenesignaturesfortheearlydetectionofcolorectalcanceracomprehensiveanalysisofrnaseqdatabyutilizingmachinelearningalgorithms
AT vetukurirameshraju unlockingthepotentialoftheca2ca7anditm2cgenesignaturesfortheearlydetectionofcolorectalcanceracomprehensiveanalysisofrnaseqdatabyutilizingmachinelearningalgorithms
AT maniashutosh unlockingthepotentialoftheca2ca7anditm2cgenesignaturesfortheearlydetectionofcolorectalcanceracomprehensiveanalysisofrnaseqdatabyutilizingmachinelearningalgorithms