Cargando…
Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5943966/ https://www.ncbi.nlm.nih.gov/pubmed/29278382 http://dx.doi.org/10.3390/molecules23010052 |
_version_ | 1783321732068671488 |
---|---|
author | Lin, Xiaohui Li, Chao Zhang, Yanhui Su, Benzhe Fan, Meng Wei, Hai |
author_facet | Lin, Xiaohui Li, Chao Zhang, Yanhui Su, Benzhe Fan, Meng Wei, Hai |
author_sort | Lin, Xiaohui |
collection | PubMed |
description | Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data. |
format | Online Article Text |
id | pubmed-5943966 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-59439662018-11-13 Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics Lin, Xiaohui Li, Chao Zhang, Yanhui Su, Benzhe Fan, Meng Wei, Hai Molecules Article Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data. MDPI 2017-12-26 /pmc/articles/PMC5943966/ /pubmed/29278382 http://dx.doi.org/10.3390/molecules23010052 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Lin, Xiaohui Li, Chao Zhang, Yanhui Su, Benzhe Fan, Meng Wei, Hai Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics |
title | Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics |
title_full | Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics |
title_fullStr | Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics |
title_full_unstemmed | Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics |
title_short | Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics |
title_sort | selecting feature subsets based on svm-rfe and the overlapping ratio with applications in bioinformatics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5943966/ https://www.ncbi.nlm.nih.gov/pubmed/29278382 http://dx.doi.org/10.3390/molecules23010052 |
work_keys_str_mv | AT linxiaohui selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics AT lichao selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics AT zhangyanhui selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics AT subenzhe selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics AT fanmeng selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics AT weihai selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics |