Cargando…

Feature Selection for high Dimensional DNA Microarray data using hybrid approaches

Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kumar, Ammu Prasanna, Valsala, Preeja
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Biomedical Informatics 2013
Materias:	Hypothesis
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3796884/ https://www.ncbi.nlm.nih.gov/pubmed/24143053 http://dx.doi.org/10.6026/97320630009824

_version_	1782287545967575040
author	Kumar, Ammu Prasanna Valsala, Preeja
author_facet	Kumar, Ammu Prasanna Valsala, Preeja
author_sort	Kumar, Ammu Prasanna
collection	PubMed
description	Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate.
format	Online Article Text
id	pubmed-3796884
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Biomedical Informatics
record_format	MEDLINE/PubMed
spelling	pubmed-37968842013-10-18 Feature Selection for high Dimensional DNA Microarray data using hybrid approaches Kumar, Ammu Prasanna Valsala, Preeja Bioinformation Hypothesis Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate. Biomedical Informatics 2013-09-23 /pmc/articles/PMC3796884/ /pubmed/24143053 http://dx.doi.org/10.6026/97320630009824 Text en © 2013 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle	Hypothesis Kumar, Ammu Prasanna Valsala, Preeja Feature Selection for high Dimensional DNA Microarray data using hybrid approaches
title	Feature Selection for high Dimensional DNA Microarray data using hybrid approaches
title_full	Feature Selection for high Dimensional DNA Microarray data using hybrid approaches
title_fullStr	Feature Selection for high Dimensional DNA Microarray data using hybrid approaches
title_full_unstemmed	Feature Selection for high Dimensional DNA Microarray data using hybrid approaches
title_short	Feature Selection for high Dimensional DNA Microarray data using hybrid approaches
title_sort	feature selection for high dimensional dna microarray data using hybrid approaches
topic	Hypothesis
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3796884/ https://www.ncbi.nlm.nih.gov/pubmed/24143053 http://dx.doi.org/10.6026/97320630009824
work_keys_str_mv	AT kumarammuprasanna featureselectionforhighdimensionaldnamicroarraydatausinghybridapproaches AT valsalapreeja featureselectionforhighdimensionaldnamicroarraydatausinghybridapproaches

Feature Selection for high Dimensional DNA Microarray data using hybrid approaches

Ejemplares similares