Cargando…

An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

BACKGROUND: Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing cla...

Descripción completa

Detalles Bibliográficos
Autores principales: Datta, Susmita, Pihur, Vasyl, Datta, Somnath
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933716/
https://www.ncbi.nlm.nih.gov/pubmed/20716381
http://dx.doi.org/10.1186/1471-2105-11-427
_version_ 1782186178510848000
author Datta, Susmita
Pihur, Vasyl
Datta, Somnath
author_facet Datta, Susmita
Pihur, Vasyl
Datta, Somnath
author_sort Datta, Susmita
collection PubMed
description BACKGROUND: Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. RESULTS: We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. CONCLUSIONS: For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.
format Text
id pubmed-2933716
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29337162010-09-07 An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data Datta, Susmita Pihur, Vasyl Datta, Somnath BMC Bioinformatics Methodology Article BACKGROUND: Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. RESULTS: We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. CONCLUSIONS: For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori. BioMed Central 2010-08-18 /pmc/articles/PMC2933716/ /pubmed/20716381 http://dx.doi.org/10.1186/1471-2105-11-427 Text en Copyright ©2010 Datta et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Datta, Susmita
Pihur, Vasyl
Datta, Somnath
An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
title An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
title_full An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
title_fullStr An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
title_full_unstemmed An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
title_short An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
title_sort adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933716/
https://www.ncbi.nlm.nih.gov/pubmed/20716381
http://dx.doi.org/10.1186/1471-2105-11-427
work_keys_str_mv AT dattasusmita anadaptiveoptimalensembleclassifierviabaggingandrankaggregationwithapplicationstohighdimensionaldata
AT pihurvasyl anadaptiveoptimalensembleclassifierviabaggingandrankaggregationwithapplicationstohighdimensionaldata
AT dattasomnath anadaptiveoptimalensembleclassifierviabaggingandrankaggregationwithapplicationstohighdimensionaldata
AT dattasusmita adaptiveoptimalensembleclassifierviabaggingandrankaggregationwithapplicationstohighdimensionaldata
AT pihurvasyl adaptiveoptimalensembleclassifierviabaggingandrankaggregationwithapplicationstohighdimensionaldata
AT dattasomnath adaptiveoptimalensembleclassifierviabaggingandrankaggregationwithapplicationstohighdimensionaldata