Cargando…

Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics

Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Xiaohui, Li, Chao, Zhang, Yanhui, Su, Benzhe, Fan, Meng, Wei, Hai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5943966/
https://www.ncbi.nlm.nih.gov/pubmed/29278382
http://dx.doi.org/10.3390/molecules23010052
_version_ 1783321732068671488
author Lin, Xiaohui
Li, Chao
Zhang, Yanhui
Su, Benzhe
Fan, Meng
Wei, Hai
author_facet Lin, Xiaohui
Li, Chao
Zhang, Yanhui
Su, Benzhe
Fan, Meng
Wei, Hai
author_sort Lin, Xiaohui
collection PubMed
description Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
format Online
Article
Text
id pubmed-5943966
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-59439662018-11-13 Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics Lin, Xiaohui Li, Chao Zhang, Yanhui Su, Benzhe Fan, Meng Wei, Hai Molecules Article Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data. MDPI 2017-12-26 /pmc/articles/PMC5943966/ /pubmed/29278382 http://dx.doi.org/10.3390/molecules23010052 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lin, Xiaohui
Li, Chao
Zhang, Yanhui
Su, Benzhe
Fan, Meng
Wei, Hai
Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
title Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
title_full Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
title_fullStr Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
title_full_unstemmed Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
title_short Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
title_sort selecting feature subsets based on svm-rfe and the overlapping ratio with applications in bioinformatics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5943966/
https://www.ncbi.nlm.nih.gov/pubmed/29278382
http://dx.doi.org/10.3390/molecules23010052
work_keys_str_mv AT linxiaohui selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics
AT lichao selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics
AT zhangyanhui selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics
AT subenzhe selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics
AT fanmeng selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics
AT weihai selectingfeaturesubsetsbasedonsvmrfeandtheoverlappingratiowithapplicationsinbioinformatics