Cargando…

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology

With the rapid development of artificial intelligence in recent years, the research on image processing, text mining, and genome informatics has gradually deepened, and the mining of large-scale databases has begun to receive more and more attention. The objects of data mining have also become more...

Descripción completa

Detalles Bibliográficos
Autor principal: Huang, Chengyuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8486514/
https://www.ncbi.nlm.nih.gov/pubmed/34603430
http://dx.doi.org/10.1155/2021/3597051
_version_ 1784577755230240768
author Huang, Chengyuan
author_facet Huang, Chengyuan
author_sort Huang, Chengyuan
collection PubMed
description With the rapid development of artificial intelligence in recent years, the research on image processing, text mining, and genome informatics has gradually deepened, and the mining of large-scale databases has begun to receive more and more attention. The objects of data mining have also become more complex, and the data dimensions of mining objects have become higher and higher. Compared with the ultra-high data dimensions, the number of samples available for analysis is too small, resulting in the production of high-dimensional small sample data. High-dimensional small sample data will bring serious dimensional disasters to the mining process. Through feature selection, redundancy and noise features in high-dimensional small sample data can be effectively eliminated, avoiding dimensional disasters and improving the actual efficiency of mining algorithms. However, the existing feature selection methods emphasize the classification or clustering performance of the feature selection results and ignore the stability of the feature selection results, which will lead to unstable feature selection results, and it is difficult to obtain real and understandable features. Based on the traditional feature selection method, this paper proposes an ensemble feature selection method, Random Bits Forest Recursive Clustering Eliminate (RBF-RCE) feature selection method, combined with multiple sets of basic classifiers to carry out parallel learning and screen out the best feature classification results, optimizes the classification performance of traditional feature selection methods, and can also improve the stability of feature selection. Then, this paper analyzes the reasons for the instability of feature selection and introduces a feature selection stability measurement method, the Intersection Measurement (IM), to evaluate whether the feature selection process is stable. The effectiveness of the proposed method is verified by experiments on several groups of high-dimensional small sample data sets.
format Online
Article
Text
id pubmed-8486514
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-84865142021-10-02 Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology Huang, Chengyuan Comput Intell Neurosci Research Article With the rapid development of artificial intelligence in recent years, the research on image processing, text mining, and genome informatics has gradually deepened, and the mining of large-scale databases has begun to receive more and more attention. The objects of data mining have also become more complex, and the data dimensions of mining objects have become higher and higher. Compared with the ultra-high data dimensions, the number of samples available for analysis is too small, resulting in the production of high-dimensional small sample data. High-dimensional small sample data will bring serious dimensional disasters to the mining process. Through feature selection, redundancy and noise features in high-dimensional small sample data can be effectively eliminated, avoiding dimensional disasters and improving the actual efficiency of mining algorithms. However, the existing feature selection methods emphasize the classification or clustering performance of the feature selection results and ignore the stability of the feature selection results, which will lead to unstable feature selection results, and it is difficult to obtain real and understandable features. Based on the traditional feature selection method, this paper proposes an ensemble feature selection method, Random Bits Forest Recursive Clustering Eliminate (RBF-RCE) feature selection method, combined with multiple sets of basic classifiers to carry out parallel learning and screen out the best feature classification results, optimizes the classification performance of traditional feature selection methods, and can also improve the stability of feature selection. Then, this paper analyzes the reasons for the instability of feature selection and introduces a feature selection stability measurement method, the Intersection Measurement (IM), to evaluate whether the feature selection process is stable. The effectiveness of the proposed method is verified by experiments on several groups of high-dimensional small sample data sets. Hindawi 2021-09-23 /pmc/articles/PMC8486514/ /pubmed/34603430 http://dx.doi.org/10.1155/2021/3597051 Text en Copyright © 2021 Chengyuan Huang. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Chengyuan
Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology
title Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology
title_full Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology
title_fullStr Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology
title_full_unstemmed Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology
title_short Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology
title_sort feature selection and feature stability measurement method for high-dimensional small sample data based on big data technology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8486514/
https://www.ncbi.nlm.nih.gov/pubmed/34603430
http://dx.doi.org/10.1155/2021/3597051
work_keys_str_mv AT huangchengyuan featureselectionandfeaturestabilitymeasurementmethodforhighdimensionalsmallsampledatabasedonbigdatatechnology