Cargando…
Classifier uncertainty: evidence, potential impact, and probabilistic treatment
Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Nevertheless, these metrics are usually taken at face value. We present an approach to quantify the uncertainty of classification performance metrics, based on a probability model of the c...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959610/ https://www.ncbi.nlm.nih.gov/pubmed/33817044 http://dx.doi.org/10.7717/peerj-cs.398 |
_version_ | 1783664987018887168 |
---|---|
author | Tötsch, Niklas Hoffmann, Daniel |
author_facet | Tötsch, Niklas Hoffmann, Daniel |
author_sort | Tötsch, Niklas |
collection | PubMed |
description | Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Nevertheless, these metrics are usually taken at face value. We present an approach to quantify the uncertainty of classification performance metrics, based on a probability model of the confusion matrix. Application of our approach to classifiers from the scientific literature and a classification competition shows that uncertainties can be surprisingly large and limit performance evaluation. In fact, some published classifiers may be misleading. The application of our approach is simple and requires only the confusion matrix. It is agnostic of the underlying classifier. Our method can also be used for the estimation of sample sizes that achieve a desired precision of a performance metric. |
format | Online Article Text |
id | pubmed-7959610 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79596102021-04-02 Classifier uncertainty: evidence, potential impact, and probabilistic treatment Tötsch, Niklas Hoffmann, Daniel PeerJ Comput Sci Computational Biology Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Nevertheless, these metrics are usually taken at face value. We present an approach to quantify the uncertainty of classification performance metrics, based on a probability model of the confusion matrix. Application of our approach to classifiers from the scientific literature and a classification competition shows that uncertainties can be surprisingly large and limit performance evaluation. In fact, some published classifiers may be misleading. The application of our approach is simple and requires only the confusion matrix. It is agnostic of the underlying classifier. Our method can also be used for the estimation of sample sizes that achieve a desired precision of a performance metric. PeerJ Inc. 2021-03-04 /pmc/articles/PMC7959610/ /pubmed/33817044 http://dx.doi.org/10.7717/peerj-cs.398 Text en © 2021 Tötsch and Hoffmann https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Computational Biology Tötsch, Niklas Hoffmann, Daniel Classifier uncertainty: evidence, potential impact, and probabilistic treatment |
title | Classifier uncertainty: evidence, potential impact, and probabilistic treatment |
title_full | Classifier uncertainty: evidence, potential impact, and probabilistic treatment |
title_fullStr | Classifier uncertainty: evidence, potential impact, and probabilistic treatment |
title_full_unstemmed | Classifier uncertainty: evidence, potential impact, and probabilistic treatment |
title_short | Classifier uncertainty: evidence, potential impact, and probabilistic treatment |
title_sort | classifier uncertainty: evidence, potential impact, and probabilistic treatment |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959610/ https://www.ncbi.nlm.nih.gov/pubmed/33817044 http://dx.doi.org/10.7717/peerj-cs.398 |
work_keys_str_mv | AT totschniklas classifieruncertaintyevidencepotentialimpactandprobabilistictreatment AT hoffmanndaniel classifieruncertaintyevidencepotentialimpactandprobabilistictreatment |