Cargando…

Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification

It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Lingyun, Ye, Mingquan, Lu, Xiaojie, Huang, Daobin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5828665/
https://www.ncbi.nlm.nih.gov/pubmed/29246519
http://dx.doi.org/10.1016/j.gpb.2017.08.002
_version_ 1783302678551461888
author Gao, Lingyun
Ye, Mingquan
Lu, Xiaojie
Huang, Daobin
author_facet Gao, Lingyun
Ye, Mingquan
Lu, Xiaojie
Huang, Daobin
author_sort Gao, Lingyun
collection PubMed
description It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B.
format Online
Article
Text
id pubmed-5828665
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-58286652018-02-28 Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification Gao, Lingyun Ye, Mingquan Lu, Xiaojie Huang, Daobin Genomics Proteomics Bioinformatics Method It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B. Elsevier 2017-12 2017-12-12 /pmc/articles/PMC5828665/ /pubmed/29246519 http://dx.doi.org/10.1016/j.gpb.2017.08.002 Text en © 2017 Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Gao, Lingyun
Ye, Mingquan
Lu, Xiaojie
Huang, Daobin
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_full Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_fullStr Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_full_unstemmed Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_short Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_sort hybrid method based on information gain and support vector machine for gene selection in cancer classification
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5828665/
https://www.ncbi.nlm.nih.gov/pubmed/29246519
http://dx.doi.org/10.1016/j.gpb.2017.08.002
work_keys_str_mv AT gaolingyun hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
AT yemingquan hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
AT luxiaojie hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
AT huangdaobin hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification