Cargando…
Seeding and Harvest: A Framework for Unsupervised Feature Selection Problems
Feature selection, also known as attribute selection, is the technique of selecting a subset of relevant features for building robust object models. It is becoming more and more important for large-scale sensors applications with AI capabilities. The core idea of this paper is derived from a straigh...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574678/ https://www.ncbi.nlm.nih.gov/pubmed/23271599 http://dx.doi.org/10.3390/s130100292 |
Sumario: | Feature selection, also known as attribute selection, is the technique of selecting a subset of relevant features for building robust object models. It is becoming more and more important for large-scale sensors applications with AI capabilities. The core idea of this paper is derived from a straightforward and intuitive principle saying that, if a feature subset (pattern) has more representativeness, it should be more self-organized, and as a result it should be more insensitive to artificially seeded noise points. In the light of this heuristic finding, we established the whole set of theoretical principles, based on which we proposed a two-stage framework to evaluate the relative importance of feature subsets, called seeding and harvest (S&H for short). At the first stage, we inject a number of artificial noise points into the original dataset; then at the second stage, we resort to an outlier detector to identify them under various feature patterns. The more precisely the seeded points can be extracted under a particular feature pattern, the more valuable and important the corresponding feature pattern should be. Besides, we compared our method with several state-of-the-art feature selection methods on a number of real-life datasets. The experiment results significantly confirm that our method can accomplish feature reduction tasks with high accuracy as well as low computing complexity. |
---|