Cargando…

A novel gene selection algorithm for cancer classification using microarray datasets

BACKGROUND: Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is kn...

Descripción completa

Detalles Bibliográficos
Autores principales: Alanni, Russul, Hou, Jingyu, Azzawi, Hasseeb, Xiang, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334429/
https://www.ncbi.nlm.nih.gov/pubmed/30646919
http://dx.doi.org/10.1186/s12920-018-0447-6
_version_ 1783387714066841600
author Alanni, Russul
Hou, Jingyu
Azzawi, Hasseeb
Xiang, Yong
author_facet Alanni, Russul
Hou, Jingyu
Azzawi, Hasseeb
Xiang, Yong
author_sort Alanni, Russul
collection PubMed
description BACKGROUND: Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results. METHODS: An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP. RESULTS: Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods. CONCLUSION: Gene subset selected by GSP can achieve a higher classification accuracy with less processing time.
format Online
Article
Text
id pubmed-6334429
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63344292019-01-23 A novel gene selection algorithm for cancer classification using microarray datasets Alanni, Russul Hou, Jingyu Azzawi, Hasseeb Xiang, Yong BMC Med Genomics Research Article BACKGROUND: Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results. METHODS: An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP. RESULTS: Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods. CONCLUSION: Gene subset selected by GSP can achieve a higher classification accuracy with less processing time. BioMed Central 2019-01-15 /pmc/articles/PMC6334429/ /pubmed/30646919 http://dx.doi.org/10.1186/s12920-018-0447-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Alanni, Russul
Hou, Jingyu
Azzawi, Hasseeb
Xiang, Yong
A novel gene selection algorithm for cancer classification using microarray datasets
title A novel gene selection algorithm for cancer classification using microarray datasets
title_full A novel gene selection algorithm for cancer classification using microarray datasets
title_fullStr A novel gene selection algorithm for cancer classification using microarray datasets
title_full_unstemmed A novel gene selection algorithm for cancer classification using microarray datasets
title_short A novel gene selection algorithm for cancer classification using microarray datasets
title_sort novel gene selection algorithm for cancer classification using microarray datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334429/
https://www.ncbi.nlm.nih.gov/pubmed/30646919
http://dx.doi.org/10.1186/s12920-018-0447-6
work_keys_str_mv AT alannirussul anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT houjingyu anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT azzawihasseeb anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT xiangyong anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT alannirussul novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT houjingyu novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT azzawihasseeb novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT xiangyong novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets