Cargando…
Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and class...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10052105/ https://www.ncbi.nlm.nih.gov/pubmed/36991790 http://dx.doi.org/10.3390/s23063080 |
_version_ | 1785015070269374464 |
---|---|
author | Bostanci, Erkan Kocak, Engin Unal, Metehan Guzel, Mehmet Serdar Acici, Koray Asuroglu, Tunc |
author_facet | Bostanci, Erkan Kocak, Engin Unal, Metehan Guzel, Mehmet Serdar Acici, Koray Asuroglu, Tunc |
author_sort | Bostanci, Erkan |
collection | PubMed |
description | Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features. |
format | Online Article Text |
id | pubmed-10052105 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100521052023-03-30 Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer Bostanci, Erkan Kocak, Engin Unal, Metehan Guzel, Mehmet Serdar Acici, Koray Asuroglu, Tunc Sensors (Basel) Article Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features. MDPI 2023-03-13 /pmc/articles/PMC10052105/ /pubmed/36991790 http://dx.doi.org/10.3390/s23063080 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Bostanci, Erkan Kocak, Engin Unal, Metehan Guzel, Mehmet Serdar Acici, Koray Asuroglu, Tunc Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer |
title | Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer |
title_full | Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer |
title_fullStr | Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer |
title_full_unstemmed | Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer |
title_short | Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer |
title_sort | machine learning analysis of rna-seq data for diagnostic and prognostic prediction of colon cancer |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10052105/ https://www.ncbi.nlm.nih.gov/pubmed/36991790 http://dx.doi.org/10.3390/s23063080 |
work_keys_str_mv | AT bostancierkan machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer AT kocakengin machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer AT unalmetehan machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer AT guzelmehmetserdar machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer AT acicikoray machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer AT asuroglutunc machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer |