Cargando…

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected d...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahmoud, Osama, Harrison, Andrew, Perperoglou, Aris, Gul, Asma, Khan, Zardad, Metodiev, Metodi V, Lausen, Berthold
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141116/
https://www.ncbi.nlm.nih.gov/pubmed/25113817
http://dx.doi.org/10.1186/1471-2105-15-274
Descripción
Sumario:BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task. RESULTS: We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. CONCLUSIONS: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users.