Cargando…
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testamen...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909287/ https://www.ncbi.nlm.nih.gov/pubmed/27304923 http://dx.doi.org/10.1371/journal.pone.0157330 |
_version_ | 1782437813430517760 |
---|---|
author | Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R. Braytee, Ali Kennedy, Paul J. |
author_facet | Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R. Braytee, Ali Kennedy, Paul J. |
author_sort | Anaissi, Ali |
collection | PubMed |
description | The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. |
format | Online Article Text |
id | pubmed-4909287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-49092872016-07-06 Ensemble Feature Learning of Genomic Data Using Support Vector Machine Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R. Braytee, Ali Kennedy, Paul J. PLoS One Research Article The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. Public Library of Science 2016-06-15 /pmc/articles/PMC4909287/ /pubmed/27304923 http://dx.doi.org/10.1371/journal.pone.0157330 Text en © 2016 Anaissi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R. Braytee, Ali Kennedy, Paul J. Ensemble Feature Learning of Genomic Data Using Support Vector Machine |
title | Ensemble Feature Learning of Genomic Data Using Support Vector Machine |
title_full | Ensemble Feature Learning of Genomic Data Using Support Vector Machine |
title_fullStr | Ensemble Feature Learning of Genomic Data Using Support Vector Machine |
title_full_unstemmed | Ensemble Feature Learning of Genomic Data Using Support Vector Machine |
title_short | Ensemble Feature Learning of Genomic Data Using Support Vector Machine |
title_sort | ensemble feature learning of genomic data using support vector machine |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909287/ https://www.ncbi.nlm.nih.gov/pubmed/27304923 http://dx.doi.org/10.1371/journal.pone.0157330 |
work_keys_str_mv | AT anaissiali ensemblefeaturelearningofgenomicdatausingsupportvectormachine AT goyalmadhu ensemblefeaturelearningofgenomicdatausingsupportvectormachine AT catchpooledanielr ensemblefeaturelearningofgenomicdatausingsupportvectormachine AT brayteeali ensemblefeaturelearningofgenomicdatausingsupportvectormachine AT kennedypaulj ensemblefeaturelearningofgenomicdatausingsupportvectormachine |