Cargando…

A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm

BACKGROUND: Gene expression data play an important role in bioinformatics applications. Although there may be a large number of features in such data, they mainly tend to contain only a few samples. This can negatively impact the performance of data mining and machine learning algorithms. One of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Azadifar, Saeid, Ahmadi, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627636/
https://www.ncbi.nlm.nih.gov/pubmed/34838034
http://dx.doi.org/10.1186/s12911-021-01696-3
_version_ 1784606874087194624
author Azadifar, Saeid
Ahmadi, Ali
author_facet Azadifar, Saeid
Ahmadi, Ali
author_sort Azadifar, Saeid
collection PubMed
description BACKGROUND: Gene expression data play an important role in bioinformatics applications. Although there may be a large number of features in such data, they mainly tend to contain only a few samples. This can negatively impact the performance of data mining and machine learning algorithms. One of the most effective approaches to alleviate this problem is to use gene selection methods. The aim of gene selection is to reduce the dimensions (features) of gene expression data leading to eliminating irrelevant and redundant genes. METHODS: This paper presents a hybrid gene selection method based on graph theory and a many-objective particle swarm optimization (PSO) algorithm. To this end, a filter method is first utilized to reduce the initial space of the genes. Then, the gene space is represented as a graph to apply a graph clustering method to group the genes into several clusters. Moreover, the many-objective PSO algorithm is utilized to search an optimal subset of genes according to several criteria, which include classification error, node centrality, specificity, edge centrality, and the number of selected genes. A repair operator is proposed to cover the whole space of the genes and ensure that at least one gene is selected from each cluster. This leads to an increasement in the diversity of the selected genes. RESULTS: To evaluate the performance of the proposed method, extensive experiments are conducted based on seven datasets and two evaluation measures. In addition, three classifiers—Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN)—are utilized to compare the effectiveness of the proposed gene selection method with other state-of-the-art methods. The results of these experiments demonstrate that our proposed method not only achieves more accurate classification, but also selects fewer genes than other methods. CONCLUSION: This study shows that the proposed multi-objective PSO algorithm simultaneously removes irrelevant and redundant features using several different criteria. Also, the use of the clustering algorithm and the repair operator has improved the performance of the proposed method by covering the whole space of the problem.
format Online
Article
Text
id pubmed-8627636
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86276362021-11-30 A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm Azadifar, Saeid Ahmadi, Ali BMC Med Inform Decis Mak Research BACKGROUND: Gene expression data play an important role in bioinformatics applications. Although there may be a large number of features in such data, they mainly tend to contain only a few samples. This can negatively impact the performance of data mining and machine learning algorithms. One of the most effective approaches to alleviate this problem is to use gene selection methods. The aim of gene selection is to reduce the dimensions (features) of gene expression data leading to eliminating irrelevant and redundant genes. METHODS: This paper presents a hybrid gene selection method based on graph theory and a many-objective particle swarm optimization (PSO) algorithm. To this end, a filter method is first utilized to reduce the initial space of the genes. Then, the gene space is represented as a graph to apply a graph clustering method to group the genes into several clusters. Moreover, the many-objective PSO algorithm is utilized to search an optimal subset of genes according to several criteria, which include classification error, node centrality, specificity, edge centrality, and the number of selected genes. A repair operator is proposed to cover the whole space of the genes and ensure that at least one gene is selected from each cluster. This leads to an increasement in the diversity of the selected genes. RESULTS: To evaluate the performance of the proposed method, extensive experiments are conducted based on seven datasets and two evaluation measures. In addition, three classifiers—Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN)—are utilized to compare the effectiveness of the proposed gene selection method with other state-of-the-art methods. The results of these experiments demonstrate that our proposed method not only achieves more accurate classification, but also selects fewer genes than other methods. CONCLUSION: This study shows that the proposed multi-objective PSO algorithm simultaneously removes irrelevant and redundant features using several different criteria. Also, the use of the clustering algorithm and the repair operator has improved the performance of the proposed method by covering the whole space of the problem. BioMed Central 2021-11-27 /pmc/articles/PMC8627636/ /pubmed/34838034 http://dx.doi.org/10.1186/s12911-021-01696-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Azadifar, Saeid
Ahmadi, Ali
A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm
title A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm
title_full A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm
title_fullStr A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm
title_full_unstemmed A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm
title_short A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm
title_sort graph-based gene selection method for medical diagnosis problems using a many-objective pso algorithm
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627636/
https://www.ncbi.nlm.nih.gov/pubmed/34838034
http://dx.doi.org/10.1186/s12911-021-01696-3
work_keys_str_mv AT azadifarsaeid agraphbasedgeneselectionmethodformedicaldiagnosisproblemsusingamanyobjectivepsoalgorithm
AT ahmadiali agraphbasedgeneselectionmethodformedicaldiagnosisproblemsusingamanyobjectivepsoalgorithm
AT azadifarsaeid graphbasedgeneselectionmethodformedicaldiagnosisproblemsusingamanyobjectivepsoalgorithm
AT ahmadiali graphbasedgeneselectionmethodformedicaldiagnosisproblemsusingamanyobjectivepsoalgorithm