Cargando…

A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data

Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales: Tabares-Soto, Reinel, Orozco-Arias, Simon, Romero-Cano, Victor, Segovia Bucheli, Vanesa, Rodríguez-Sotelo, José Luis, Jiménez-Varón, Cristian Felipe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924492/
https://www.ncbi.nlm.nih.gov/pubmed/33816921
http://dx.doi.org/10.7717/peerj-cs.270
_version_ 1783659101897621504
author Tabares-Soto, Reinel
Orozco-Arias, Simon
Romero-Cano, Victor
Segovia Bucheli, Vanesa
Rodríguez-Sotelo, José Luis
Jiménez-Varón, Cristian Felipe
author_facet Tabares-Soto, Reinel
Orozco-Arias, Simon
Romero-Cano, Victor
Segovia Bucheli, Vanesa
Rodríguez-Sotelo, José Luis
Jiménez-Varón, Cristian Felipe
author_sort Tabares-Soto, Reinel
collection PubMed
description Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors described in the 11_tumor database. We obtained tumor identification accuracies between 90.6% (Logistic Regression) and 94.43% (Convolutional Neural Networks) using k-fold cross-validation. Also, we show how a tuning process may or may not significantly improve algorithms’ accuracies. Our results demonstrate an efficient and accurate classification method based on gene expression (microarray data) and ML/DL algorithms, which facilitates tumor type prediction in a multi-cancer-type scenario.
format Online
Article
Text
id pubmed-7924492
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244922021-04-02 A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data Tabares-Soto, Reinel Orozco-Arias, Simon Romero-Cano, Victor Segovia Bucheli, Vanesa Rodríguez-Sotelo, José Luis Jiménez-Varón, Cristian Felipe PeerJ Comput Sci Bioinformatics Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors described in the 11_tumor database. We obtained tumor identification accuracies between 90.6% (Logistic Regression) and 94.43% (Convolutional Neural Networks) using k-fold cross-validation. Also, we show how a tuning process may or may not significantly improve algorithms’ accuracies. Our results demonstrate an efficient and accurate classification method based on gene expression (microarray data) and ML/DL algorithms, which facilitates tumor type prediction in a multi-cancer-type scenario. PeerJ Inc. 2020-04-13 /pmc/articles/PMC7924492/ /pubmed/33816921 http://dx.doi.org/10.7717/peerj-cs.270 Text en © 2020 Tabares-Soto et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Tabares-Soto, Reinel
Orozco-Arias, Simon
Romero-Cano, Victor
Segovia Bucheli, Vanesa
Rodríguez-Sotelo, José Luis
Jiménez-Varón, Cristian Felipe
A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
title A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
title_full A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
title_fullStr A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
title_full_unstemmed A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
title_short A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
title_sort comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924492/
https://www.ncbi.nlm.nih.gov/pubmed/33816921
http://dx.doi.org/10.7717/peerj-cs.270
work_keys_str_mv AT tabaressotoreinel acomparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT orozcoariassimon acomparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT romerocanovictor acomparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT segoviabuchelivanesa acomparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT rodriguezsotelojoseluis acomparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT jimenezvaroncristianfelipe acomparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT tabaressotoreinel comparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT orozcoariassimon comparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT romerocanovictor comparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT segoviabuchelivanesa comparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT rodriguezsotelojoseluis comparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata
AT jimenezvaroncristianfelipe comparativestudyofmachinelearninganddeeplearningalgorithmstoclassifycancertypesbasedonmicroarraygeneexpressiondata