Cargando…

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

BACKGROUND: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. A...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Qingzhong, Sung, Andrew H, Chen, Zhongxue, Liu, Jianzhong, Chen, Lei, Qiao, Mengyu, Wang, Zhaohui, Huang, Xudong, Deng, Youping
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287491/ https://www.ncbi.nlm.nih.gov/pubmed/22369383 http://dx.doi.org/10.1186/1471-2164-12-S5-S1

_version_	1782224675286286336
author	Liu, Qingzhong Sung, Andrew H Chen, Zhongxue Liu, Jianzhong Chen, Lei Qiao, Mengyu Wang, Zhaohui Huang, Xudong Deng, Youping
author_facet	Liu, Qingzhong Sung, Andrew H Chen, Zhongxue Liu, Jianzhong Chen, Lei Qiao, Mengyu Wang, Zhaohui Huang, Xudong Deng, Youping
author_sort	Liu, Qingzhong
collection	PubMed
description	BACKGROUND: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. RESULTS: To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. CONCLUSIONS: On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.
format	Online Article Text
id	pubmed-3287491
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32874912012-03-01 Gene selection and classification for cancer microarray data based on machine learning and similarity measures Liu, Qingzhong Sung, Andrew H Chen, Zhongxue Liu, Jianzhong Chen, Lei Qiao, Mengyu Wang, Zhaohui Huang, Xudong Deng, Youping BMC Genomics Research Article BACKGROUND: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. RESULTS: To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. CONCLUSIONS: On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF. BioMed Central 2011-12-23 /pmc/articles/PMC3287491/ /pubmed/22369383 http://dx.doi.org/10.1186/1471-2164-12-S5-S1 Text en Copyright ©2011 Liu et al. licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Liu, Qingzhong Sung, Andrew H Chen, Zhongxue Liu, Jianzhong Chen, Lei Qiao, Mengyu Wang, Zhaohui Huang, Xudong Deng, Youping Gene selection and classification for cancer microarray data based on machine learning and similarity measures
title	Gene selection and classification for cancer microarray data based on machine learning and similarity measures
title_full	Gene selection and classification for cancer microarray data based on machine learning and similarity measures
title_fullStr	Gene selection and classification for cancer microarray data based on machine learning and similarity measures
title_full_unstemmed	Gene selection and classification for cancer microarray data based on machine learning and similarity measures
title_short	Gene selection and classification for cancer microarray data based on machine learning and similarity measures
title_sort	gene selection and classification for cancer microarray data based on machine learning and similarity measures
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287491/ https://www.ncbi.nlm.nih.gov/pubmed/22369383 http://dx.doi.org/10.1186/1471-2164-12-S5-S1
work_keys_str_mv	AT liuqingzhong geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT sungandrewh geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT chenzhongxue geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT liujianzhong geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT chenlei geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT qiaomengyu geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT wangzhaohui geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT huangxudong geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures AT dengyouping geneselectionandclassificationforcancermicroarraydatabasedonmachinelearningandsimilaritymeasures

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Ejemplares similares