Cargando…

A comparative study of improvements Pre-filter methods bring on feature selection using microarray data

BACKGROUND: Feature selection techniques have become an apparent need in biomarker discoveries with the development of microarray. However, the high dimensional nature of microarray made feature selection become time-consuming. To overcome such difficulties, filter data according to the background k...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yingying, Fan, Xiaomao, Cai, Yunpeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4340279/
https://www.ncbi.nlm.nih.gov/pubmed/25825671
http://dx.doi.org/10.1186/2047-2501-2-7
Descripción
Sumario:BACKGROUND: Feature selection techniques have become an apparent need in biomarker discoveries with the development of microarray. However, the high dimensional nature of microarray made feature selection become time-consuming. To overcome such difficulties, filter data according to the background knowledge before applying feature selection techniques has become a hot topic in microarray analysis. Different methods may affect final results greatly, thus it is important to evaluate these pre-filter methods in a system way. METHODS: In this paper, we compared the performance of statistical-based, biological-based pre-filter methods and the combination of them on microRNA-mRNA parallel expression profiles using L1 logistic regression as feature selection techniques. Four types of data were built for both microRNA and mRNA expression profiles. RESULTS: Results showed that pre-filter methods could reduce the number of features greatly for both mRNA and microRNA expression datasets. The features selected after pre-filter procedures were shown to be significant in biological levels such as biology process and microRNA functions. Analyses of classification performance based on precision showed the pre-filter methods were necessary when the number of raw features was much bigger than that of samples. All the computing time was greatly shortened after pre-filter procedures. CONCLUSIONS: With similar or better classification improvements, less but biological significant features, pre-filter-based feature selection should be taken into consideration if researchers need fast results when facing complex computing problems in bioinformatics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/2047-2501-2-7) contains supplementary material, which is available to authorized users.