Cargando…
Characteristics of predictor sets found using differential prioritization
BACKGROUND: Feature selection plays an undeniably important role in classification problems involving high dimensional datasets such as microarray datasets. For filter-based feature selection, two well-known criteria used in forming predictor sets are relevance and redundancy. However, there is a th...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1920513/ https://www.ncbi.nlm.nih.gov/pubmed/17547742 http://dx.doi.org/10.1186/1748-7188-2-7 |
Sumario: | BACKGROUND: Feature selection plays an undeniably important role in classification problems involving high dimensional datasets such as microarray datasets. For filter-based feature selection, two well-known criteria used in forming predictor sets are relevance and redundancy. However, there is a third criterion which is at least as important as the other two in affecting the efficacy of the resulting predictor sets. This criterion is the degree of differential prioritization (DDP), which varies the emphases on relevance and redundancy depending on the value of the DDP. Previous empirical works on publicly available microarray datasets have confirmed the effectiveness of the DDP in molecular classification. We now propose to establish the fundamental strengths and merits of the DDP-based feature selection technique. This is to be done through a simulation study which involves vigorous analyses of the characteristics of predictor sets found using different values of the DDP from toy datasets designed to mimic real-life microarray datasets. RESULTS: A simulation study employing analytical measures such as the distance between classes before and after transformation using principal component analysis is implemented on toy datasets. From these analyses, the necessity of adjusting the differential prioritization based on the dataset of interest is established. This conclusion is supported by comparisons against both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods, which demonstrates the superiority of the DDP-based feature selection technique. Reapplying similar analyses to real-life multiclass microarray datasets provides further confirmation of our findings and of the significance of the DDP for practical applications. CONCLUSION: The findings have been achieved based on analytical evaluations, not empirical evaluation involving classifiers, thus providing further basis for the usefulness of the DDP and validating the need for unequal priorities on relevance and redundancy during feature selection for microarray datasets, especially highly multiclass datasets. |
---|