Cargando…

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

BACKGROUND: Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ge, Guangtao, Wong, G William
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440392/ https://www.ncbi.nlm.nih.gov/pubmed/18547427 http://dx.doi.org/10.1186/1471-2105-9-275

_version_	1782156544802029568
author	Ge, Guangtao Wong, G William
author_facet	Ge, Guangtao Wong, G William
author_sort	Ge, Guangtao
collection	PubMed
description	BACKGROUND: Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. RESULTS: This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm) to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost). We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. CONCLUSION: In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection method allows us to select biomarkers with potentially important roles in cancer development, therefore highlighting the validity of this method.
format	Text
id	pubmed-2440392
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24403922008-06-27 Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles Ge, Guangtao Wong, G William BMC Bioinformatics Methodology Article BACKGROUND: Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. RESULTS: This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm) to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost). We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. CONCLUSION: In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection method allows us to select biomarkers with potentially important roles in cancer development, therefore highlighting the validity of this method. BioMed Central 2008-06-11 /pmc/articles/PMC2440392/ /pubmed/18547427 http://dx.doi.org/10.1186/1471-2105-9-275 Text en Copyright © 2008 Ge and Wong; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Ge, Guangtao Wong, G William Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
title	Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
title_full	Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
title_fullStr	Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
title_full_unstemmed	Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
title_short	Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
title_sort	classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440392/ https://www.ncbi.nlm.nih.gov/pubmed/18547427 http://dx.doi.org/10.1186/1471-2105-9-275
work_keys_str_mv	AT geguangtao classificationofpremalignantpancreaticcancermassspectrometrydatausingdecisiontreeensembles AT wonggwilliam classificationofpremalignantpancreaticcancermassspectrometrydatausingdecisiontreeensembles

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

Ejemplares similares