Cargando…

A comparative study of different machine learning methods on microarray gene expression data

BACKGROUND: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in r...

Descripción completa

Detalles Bibliográficos
Autores principales: Pirooznia, Mehdi, Yang, Jack Y, Yang, Mary Qu, Deng, Youping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386055/
https://www.ncbi.nlm.nih.gov/pubmed/18366602
http://dx.doi.org/10.1186/1471-2164-9-S1-S13
_version_ 1782155201785888768
author Pirooznia, Mehdi
Yang, Jack Y
Yang, Mary Qu
Deng, Youping
author_facet Pirooznia, Mehdi
Yang, Jack Y
Yang, Mary Qu
Deng, Youping
author_sort Pirooznia, Mehdi
collection PubMed
description BACKGROUND: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results. RESULTS: In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers. CONCLUSIONS: We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.
format Text
id pubmed-2386055
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23860552008-05-15 A comparative study of different machine learning methods on microarray gene expression data Pirooznia, Mehdi Yang, Jack Y Yang, Mary Qu Deng, Youping BMC Genomics Research BACKGROUND: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results. RESULTS: In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers. CONCLUSIONS: We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes. BioMed Central 2008-03-20 /pmc/articles/PMC2386055/ /pubmed/18366602 http://dx.doi.org/10.1186/1471-2164-9-S1-S13 Text en Copyright © 2008 Pirooznia et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Pirooznia, Mehdi
Yang, Jack Y
Yang, Mary Qu
Deng, Youping
A comparative study of different machine learning methods on microarray gene expression data
title A comparative study of different machine learning methods on microarray gene expression data
title_full A comparative study of different machine learning methods on microarray gene expression data
title_fullStr A comparative study of different machine learning methods on microarray gene expression data
title_full_unstemmed A comparative study of different machine learning methods on microarray gene expression data
title_short A comparative study of different machine learning methods on microarray gene expression data
title_sort comparative study of different machine learning methods on microarray gene expression data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386055/
https://www.ncbi.nlm.nih.gov/pubmed/18366602
http://dx.doi.org/10.1186/1471-2164-9-S1-S13
work_keys_str_mv AT piroozniamehdi acomparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT yangjacky acomparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT yangmaryqu acomparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT dengyouping acomparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT piroozniamehdi comparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT yangjacky comparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT yangmaryqu comparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata
AT dengyouping comparativestudyofdifferentmachinelearningmethodsonmicroarraygeneexpressiondata