Cargando…

Comparison of feature selection and classification for MALDI-MS data

INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algor...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qingzhong, Sung, Andrew H, Qiao, Mengyu, Chen, Zhongxue, Yang, Jack Y, Yang, Mary Qu, Huang, Xudong, Deng, Youping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2709264/
https://www.ncbi.nlm.nih.gov/pubmed/19594880
http://dx.doi.org/10.1186/1471-2164-10-S1-S3
_version_ 1782169286460047360
author Liu, Qingzhong
Sung, Andrew H
Qiao, Mengyu
Chen, Zhongxue
Yang, Jack Y
Yang, Mary Qu
Huang, Xudong
Deng, Youping
author_facet Liu, Qingzhong
Sung, Andrew H
Qiao, Mengyu
Chen, Zhongxue
Yang, Jack Y
Yang, Mary Qu
Huang, Xudong
Deng, Youping
author_sort Liu, Qingzhong
collection PubMed
description INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. RESULTS: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. CONCLUSION: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study.
format Text
id pubmed-2709264
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27092642009-07-14 Comparison of feature selection and classification for MALDI-MS data Liu, Qingzhong Sung, Andrew H Qiao, Mengyu Chen, Zhongxue Yang, Jack Y Yang, Mary Qu Huang, Xudong Deng, Youping BMC Genomics Research INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. RESULTS: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. CONCLUSION: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study. BioMed Central 2009-07-07 /pmc/articles/PMC2709264/ /pubmed/19594880 http://dx.doi.org/10.1186/1471-2164-10-S1-S3 Text en Copyright © 2009 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Liu, Qingzhong
Sung, Andrew H
Qiao, Mengyu
Chen, Zhongxue
Yang, Jack Y
Yang, Mary Qu
Huang, Xudong
Deng, Youping
Comparison of feature selection and classification for MALDI-MS data
title Comparison of feature selection and classification for MALDI-MS data
title_full Comparison of feature selection and classification for MALDI-MS data
title_fullStr Comparison of feature selection and classification for MALDI-MS data
title_full_unstemmed Comparison of feature selection and classification for MALDI-MS data
title_short Comparison of feature selection and classification for MALDI-MS data
title_sort comparison of feature selection and classification for maldi-ms data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2709264/
https://www.ncbi.nlm.nih.gov/pubmed/19594880
http://dx.doi.org/10.1186/1471-2164-10-S1-S3
work_keys_str_mv AT liuqingzhong comparisonoffeatureselectionandclassificationformaldimsdata
AT sungandrewh comparisonoffeatureselectionandclassificationformaldimsdata
AT qiaomengyu comparisonoffeatureselectionandclassificationformaldimsdata
AT chenzhongxue comparisonoffeatureselectionandclassificationformaldimsdata
AT yangjacky comparisonoffeatureselectionandclassificationformaldimsdata
AT yangmaryqu comparisonoffeatureselectionandclassificationformaldimsdata
AT huangxudong comparisonoffeatureselectionandclassificationformaldimsdata
AT dengyouping comparisonoffeatureselectionandclassificationformaldimsdata