Cargando…

A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence

Colorectal cancer (CRC) carcinogenesis is generally the result of the sequential mutation and deletion of various genes; this is known as the normal mucosa–adenoma–carcinoma sequence. The aim of this study was to develop a predictor-classifier during the “adenoma-carcinoma” sequence using microarray...

Descripción completa

Detalles Bibliográficos
Autores principales: Lacalamita, Antonio, Piccinno, Emanuele, Scalavino, Viviana, Bellotti, Roberto, Giannelli, Gianluigi, Serino, Grazia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8698794/
https://www.ncbi.nlm.nih.gov/pubmed/34944753
http://dx.doi.org/10.3390/biomedicines9121937
_version_ 1784620362738171904
author Lacalamita, Antonio
Piccinno, Emanuele
Scalavino, Viviana
Bellotti, Roberto
Giannelli, Gianluigi
Serino, Grazia
author_facet Lacalamita, Antonio
Piccinno, Emanuele
Scalavino, Viviana
Bellotti, Roberto
Giannelli, Gianluigi
Serino, Grazia
author_sort Lacalamita, Antonio
collection PubMed
description Colorectal cancer (CRC) carcinogenesis is generally the result of the sequential mutation and deletion of various genes; this is known as the normal mucosa–adenoma–carcinoma sequence. The aim of this study was to develop a predictor-classifier during the “adenoma-carcinoma” sequence using microarray gene expression profiles of primary CRC, adenoma, and normal colon epithelial tissues. Four gene expression profiles from the Gene Expression Omnibus database, containing 465 samples (105 normal, 155 adenoma, and 205 CRC), were preprocessed to identify differentially expressed genes (DEGs) between adenoma tissue and primary CRC. The feature selection procedure, using the sequential Boruta algorithm and Stepwise Regression, determined 56 highly important genes. K-Means methods showed that, using the selected 56 DEGs, the three groups were clearly separate. The classification was performed with machine learning algorithms such as Linear Model (LM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Artificial Neural Network (ANN). The best classification method in terms of accuracy (88.06 ± 0.70) and AUC (92.04 ± 0.47) was k-NN. To confirm the relevance of the predictive models, we applied the four models on a validation cohort: the k-NN model remained the best model in terms of performance, with 91.11% accuracy. Among the 56 DEGs, we identified 17 genes with an ascending or descending trend through the normal mucosa–adenoma–carcinoma sequence. Moreover, using the survival information of the TCGA database, we selected six DEGs related to patient prognosis (SCARA5, PKIB, CWH43, TEX11, METTL7A, and VEGFA). The six-gene-based classifier described in the current study could be used as a potential biomarker for the early diagnosis of CRC.
format Online
Article
Text
id pubmed-8698794
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86987942021-12-24 A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence Lacalamita, Antonio Piccinno, Emanuele Scalavino, Viviana Bellotti, Roberto Giannelli, Gianluigi Serino, Grazia Biomedicines Article Colorectal cancer (CRC) carcinogenesis is generally the result of the sequential mutation and deletion of various genes; this is known as the normal mucosa–adenoma–carcinoma sequence. The aim of this study was to develop a predictor-classifier during the “adenoma-carcinoma” sequence using microarray gene expression profiles of primary CRC, adenoma, and normal colon epithelial tissues. Four gene expression profiles from the Gene Expression Omnibus database, containing 465 samples (105 normal, 155 adenoma, and 205 CRC), were preprocessed to identify differentially expressed genes (DEGs) between adenoma tissue and primary CRC. The feature selection procedure, using the sequential Boruta algorithm and Stepwise Regression, determined 56 highly important genes. K-Means methods showed that, using the selected 56 DEGs, the three groups were clearly separate. The classification was performed with machine learning algorithms such as Linear Model (LM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Artificial Neural Network (ANN). The best classification method in terms of accuracy (88.06 ± 0.70) and AUC (92.04 ± 0.47) was k-NN. To confirm the relevance of the predictive models, we applied the four models on a validation cohort: the k-NN model remained the best model in terms of performance, with 91.11% accuracy. Among the 56 DEGs, we identified 17 genes with an ascending or descending trend through the normal mucosa–adenoma–carcinoma sequence. Moreover, using the survival information of the TCGA database, we selected six DEGs related to patient prognosis (SCARA5, PKIB, CWH43, TEX11, METTL7A, and VEGFA). The six-gene-based classifier described in the current study could be used as a potential biomarker for the early diagnosis of CRC. MDPI 2021-12-17 /pmc/articles/PMC8698794/ /pubmed/34944753 http://dx.doi.org/10.3390/biomedicines9121937 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lacalamita, Antonio
Piccinno, Emanuele
Scalavino, Viviana
Bellotti, Roberto
Giannelli, Gianluigi
Serino, Grazia
A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence
title A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence
title_full A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence
title_fullStr A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence
title_full_unstemmed A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence
title_short A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence
title_sort gene-based machine learning classifier associated to the colorectal adenoma—carcinoma sequence
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8698794/
https://www.ncbi.nlm.nih.gov/pubmed/34944753
http://dx.doi.org/10.3390/biomedicines9121937
work_keys_str_mv AT lacalamitaantonio agenebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT piccinnoemanuele agenebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT scalavinoviviana agenebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT bellottiroberto agenebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT giannelligianluigi agenebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT serinograzia agenebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT lacalamitaantonio genebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT piccinnoemanuele genebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT scalavinoviviana genebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT bellottiroberto genebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT giannelligianluigi genebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence
AT serinograzia genebasedmachinelearningclassifierassociatedtothecolorectaladenomacarcinomasequence