Cargando…

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Qingzhong, Sung, Andrew H., Chen, Zhongxue, Liu, Jianzhong, Huang, Xudong, Deng, Youping
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789385/ https://www.ncbi.nlm.nih.gov/pubmed/20011240 http://dx.doi.org/10.1371/journal.pone.0008250

_version_	1782175046387630080
author	Liu, Qingzhong Sung, Andrew H. Chen, Zhongxue Liu, Jianzhong Huang, Xudong Deng, Youping
author_facet	Liu, Qingzhong Sung, Andrew H. Chen, Zhongxue Liu, Jianzhong Huang, Xudong Deng, Youping
author_sort	Liu, Qingzhong
collection	PubMed
description	Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods: Support Vector Machine Recursive Feature Elimination (SVMRFE). Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS). Gradient based Leave-one-out Gene Selection (GLGS). To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II) breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC) is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and AUC errors.
format	Text
id	pubmed-2789385
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-27893852009-12-15 Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data Liu, Qingzhong Sung, Andrew H. Chen, Zhongxue Liu, Jianzhong Huang, Xudong Deng, Youping PLoS One Research Article Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods: Support Vector Machine Recursive Feature Elimination (SVMRFE). Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS). Gradient based Leave-one-out Gene Selection (GLGS). To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II) breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC) is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and AUC errors. Public Library of Science 2009-12-11 /pmc/articles/PMC2789385/ /pubmed/20011240 http://dx.doi.org/10.1371/journal.pone.0008250 Text en Liu et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Liu, Qingzhong Sung, Andrew H. Chen, Zhongxue Liu, Jianzhong Huang, Xudong Deng, Youping Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
title	Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
title_full	Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
title_fullStr	Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
title_full_unstemmed	Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
title_short	Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
title_sort	feature selection and classification of maqc-ii breast cancer and multiple myeloma microarray gene expression data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789385/ https://www.ncbi.nlm.nih.gov/pubmed/20011240 http://dx.doi.org/10.1371/journal.pone.0008250
work_keys_str_mv	AT liuqingzhong featureselectionandclassificationofmaqciibreastcancerandmultiplemyelomamicroarraygeneexpressiondata AT sungandrewh featureselectionandclassificationofmaqciibreastcancerandmultiplemyelomamicroarraygeneexpressiondata AT chenzhongxue featureselectionandclassificationofmaqciibreastcancerandmultiplemyelomamicroarraygeneexpressiondata AT liujianzhong featureselectionandclassificationofmaqciibreastcancerandmultiplemyelomamicroarraygeneexpressiondata AT huangxudong featureselectionandclassificationofmaqciibreastcancerandmultiplemyelomamicroarraygeneexpressiondata AT dengyouping featureselectionandclassificationofmaqciibreastcancerandmultiplemyelomamicroarraygeneexpressiondata

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

Ejemplares similares