Cargando…

Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems

BACKGROUND: Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. In particular, choosing a classifier depends heavily on the features selected. For high-throughput biomedical datasets, feature selection is ofte...

Descripción completa

Detalles Bibliográficos
Autores principales:	Parry, R Mitchell, Phan, John H, Wang, May D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3485616/ https://www.ncbi.nlm.nih.gov/pubmed/22536905 http://dx.doi.org/10.1186/1471-2105-13-S3-S7

_version_	1782248326252462080
author	Parry, R Mitchell Phan, John H Wang, May D
author_facet	Parry, R Mitchell Phan, John H Wang, May D
author_sort	Parry, R Mitchell
collection	PubMed
description	BACKGROUND: Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. In particular, choosing a classifier depends heavily on the features selected. For high-throughput biomedical datasets, feature selection is often a preprocessing step that gives an unfair advantage to the classifiers built with the same modeling assumptions. In this paper, we seek classifiers that are suitable to a particular problem independent of feature selection. We propose a novel measure, called "win percentage", for assessing the suitability of machine classifiers to a particular problem. We define win percentage as the probability a classifier will perform better than its peers on a finite random sample of feature sets, giving each classifier equal opportunity to find suitable features. RESULTS: First, we illustrate the difficulty in evaluating classifiers after feature selection. We show that several classifiers can each perform statistically significantly better than their peers given the right feature set among the top 0.001% of all feature sets. We illustrate the utility of win percentage using synthetic data, and evaluate six classifiers in analyzing eight microarray datasets representing three diseases: breast cancer, multiple myeloma, and neuroblastoma. After initially using all Gaussian gene-pairs, we show that precise estimates of win percentage (within 1%) can be achieved using a smaller random sample of all feature pairs. We show that for these data no single classifier can be considered the best without knowing the feature set. Instead, win percentage captures the non-zero probability that each classifier will outperform its peers based on an empirical estimate of performance. CONCLUSIONS: Fundamentally, we illustrate that the selection of the most suitable classifier (i.e., one that is more likely to perform better than its peers) not only depends on the dataset and application but also on the thoroughness of feature selection. In particular, win percentage provides a single measurement that could assist users in eliminating or selecting classifiers for their particular application.
format	Online Article Text
id	pubmed-3485616
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34856162012-11-01 Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems Parry, R Mitchell Phan, John H Wang, May D BMC Bioinformatics Proceedings BACKGROUND: Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. In particular, choosing a classifier depends heavily on the features selected. For high-throughput biomedical datasets, feature selection is often a preprocessing step that gives an unfair advantage to the classifiers built with the same modeling assumptions. In this paper, we seek classifiers that are suitable to a particular problem independent of feature selection. We propose a novel measure, called "win percentage", for assessing the suitability of machine classifiers to a particular problem. We define win percentage as the probability a classifier will perform better than its peers on a finite random sample of feature sets, giving each classifier equal opportunity to find suitable features. RESULTS: First, we illustrate the difficulty in evaluating classifiers after feature selection. We show that several classifiers can each perform statistically significantly better than their peers given the right feature set among the top 0.001% of all feature sets. We illustrate the utility of win percentage using synthetic data, and evaluate six classifiers in analyzing eight microarray datasets representing three diseases: breast cancer, multiple myeloma, and neuroblastoma. After initially using all Gaussian gene-pairs, we show that precise estimates of win percentage (within 1%) can be achieved using a smaller random sample of all feature pairs. We show that for these data no single classifier can be considered the best without knowing the feature set. Instead, win percentage captures the non-zero probability that each classifier will outperform its peers based on an empirical estimate of performance. CONCLUSIONS: Fundamentally, we illustrate that the selection of the most suitable classifier (i.e., one that is more likely to perform better than its peers) not only depends on the dataset and application but also on the thoroughness of feature selection. In particular, win percentage provides a single measurement that could assist users in eliminating or selecting classifiers for their particular application. BioMed Central 2012-03-21 /pmc/articles/PMC3485616/ /pubmed/22536905 http://dx.doi.org/10.1186/1471-2105-13-S3-S7 Text en Copyright ©2012 Parry et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Parry, R Mitchell Phan, John H Wang, May D Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
title	Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
title_full	Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
title_fullStr	Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
title_full_unstemmed	Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
title_short	Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
title_sort	win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3485616/ https://www.ncbi.nlm.nih.gov/pubmed/22536905 http://dx.doi.org/10.1186/1471-2105-13-S3-S7
work_keys_str_mv	AT parryrmitchell winpercentageanovelmeasureforassessingthesuitabilityofmachineclassifiersforbiologicalproblems AT phanjohnh winpercentageanovelmeasureforassessingthesuitabilityofmachineclassifiersforbiologicalproblems AT wangmayd winpercentageanovelmeasureforassessingthesuitabilityofmachineclassifiersforbiologicalproblems

Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems

Ejemplares similares