Cargando…

Ensemble Feature Learning of Genomic Data Using Support Vector Machine

The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testamen...

Descripción completa

Detalles Bibliográficos
Autores principales: Anaissi, Ali, Goyal, Madhu, Catchpoole, Daniel R., Braytee, Ali, Kennedy, Paul J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909287/
https://www.ncbi.nlm.nih.gov/pubmed/27304923
http://dx.doi.org/10.1371/journal.pone.0157330
_version_ 1782437813430517760
author Anaissi, Ali
Goyal, Madhu
Catchpoole, Daniel R.
Braytee, Ali
Kennedy, Paul J.
author_facet Anaissi, Ali
Goyal, Madhu
Catchpoole, Daniel R.
Braytee, Ali
Kennedy, Paul J.
author_sort Anaissi, Ali
collection PubMed
description The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.
format Online
Article
Text
id pubmed-4909287
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49092872016-07-06 Ensemble Feature Learning of Genomic Data Using Support Vector Machine Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R. Braytee, Ali Kennedy, Paul J. PLoS One Research Article The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. Public Library of Science 2016-06-15 /pmc/articles/PMC4909287/ /pubmed/27304923 http://dx.doi.org/10.1371/journal.pone.0157330 Text en © 2016 Anaissi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Anaissi, Ali
Goyal, Madhu
Catchpoole, Daniel R.
Braytee, Ali
Kennedy, Paul J.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
title Ensemble Feature Learning of Genomic Data Using Support Vector Machine
title_full Ensemble Feature Learning of Genomic Data Using Support Vector Machine
title_fullStr Ensemble Feature Learning of Genomic Data Using Support Vector Machine
title_full_unstemmed Ensemble Feature Learning of Genomic Data Using Support Vector Machine
title_short Ensemble Feature Learning of Genomic Data Using Support Vector Machine
title_sort ensemble feature learning of genomic data using support vector machine
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909287/
https://www.ncbi.nlm.nih.gov/pubmed/27304923
http://dx.doi.org/10.1371/journal.pone.0157330
work_keys_str_mv AT anaissiali ensemblefeaturelearningofgenomicdatausingsupportvectormachine
AT goyalmadhu ensemblefeaturelearningofgenomicdatausingsupportvectormachine
AT catchpooledanielr ensemblefeaturelearningofgenomicdatausingsupportvectormachine
AT brayteeali ensemblefeaturelearningofgenomicdatausingsupportvectormachine
AT kennedypaulj ensemblefeaturelearningofgenomicdatausingsupportvectormachine