Cargando…

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review

Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental r...

Descripción completa

Detalles Bibliográficos
Autores principales: Alharbi, Fadi, Vakanski, Aleksandar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9952758/
https://www.ncbi.nlm.nih.gov/pubmed/36829667
http://dx.doi.org/10.3390/bioengineering10020173
_version_ 1784893711269756928
author Alharbi, Fadi
Vakanski, Aleksandar
author_facet Alharbi, Fadi
Vakanski, Aleksandar
author_sort Alharbi, Fadi
collection PubMed
description Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
format Online
Article
Text
id pubmed-9952758
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99527582023-02-25 Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review Alharbi, Fadi Vakanski, Aleksandar Bioengineering (Basel) Review Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification. MDPI 2023-01-28 /pmc/articles/PMC9952758/ /pubmed/36829667 http://dx.doi.org/10.3390/bioengineering10020173 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Alharbi, Fadi
Vakanski, Aleksandar
Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
title Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
title_full Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
title_fullStr Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
title_full_unstemmed Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
title_short Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
title_sort machine learning methods for cancer classification using gene expression data: a review
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9952758/
https://www.ncbi.nlm.nih.gov/pubmed/36829667
http://dx.doi.org/10.3390/bioengineering10020173
work_keys_str_mv AT alharbifadi machinelearningmethodsforcancerclassificationusinggeneexpressiondataareview
AT vakanskialeksandar machinelearningmethodsforcancerclassificationusinggeneexpressiondataareview