Cargando…

The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relati...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sung-Cheol, Arun, Adith S., Ahsen, Mehmet Eren, Vogel, Robert, Stolovitzky, Gustavo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8403970/
https://www.ncbi.nlm.nih.gov/pubmed/34413191
http://dx.doi.org/10.1073/pnas.2100761118
_version_ 1783746078855659520
author Kim, Sung-Cheol
Arun, Adith S.
Ahsen, Mehmet Eren
Vogel, Robert
Stolovitzky, Gustavo
author_facet Kim, Sung-Cheol
Arun, Adith S.
Ahsen, Mehmet Eren
Vogel, Robert
Stolovitzky, Gustavo
author_sort Kim, Sung-Cheol
collection PubMed
description Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi–Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi–Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier’s output probability to combine possibly very different classifiers.
format Online
Article
Text
id pubmed-8403970
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-84039702021-09-14 The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers Kim, Sung-Cheol Arun, Adith S. Ahsen, Mehmet Eren Vogel, Robert Stolovitzky, Gustavo Proc Natl Acad Sci U S A Physical Sciences Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi–Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi–Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier’s output probability to combine possibly very different classifiers. National Academy of Sciences 2021-08-24 2021-08-19 /pmc/articles/PMC8403970/ /pubmed/34413191 http://dx.doi.org/10.1073/pnas.2100761118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Physical Sciences
Kim, Sung-Cheol
Arun, Adith S.
Ahsen, Mehmet Eren
Vogel, Robert
Stolovitzky, Gustavo
The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
title The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
title_full The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
title_fullStr The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
title_full_unstemmed The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
title_short The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
title_sort fermi–dirac distribution provides a calibrated probabilistic output for binary classifiers
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8403970/
https://www.ncbi.nlm.nih.gov/pubmed/34413191
http://dx.doi.org/10.1073/pnas.2100761118
work_keys_str_mv AT kimsungcheol thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT arunadiths thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT ahsenmehmeteren thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT vogelrobert thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT stolovitzkygustavo thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT kimsungcheol fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT arunadiths fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT ahsenmehmeteren fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT vogelrobert fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers
AT stolovitzkygustavo fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers