Cargando…
The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers
Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relati...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8403970/ https://www.ncbi.nlm.nih.gov/pubmed/34413191 http://dx.doi.org/10.1073/pnas.2100761118 |
_version_ | 1783746078855659520 |
---|---|
author | Kim, Sung-Cheol Arun, Adith S. Ahsen, Mehmet Eren Vogel, Robert Stolovitzky, Gustavo |
author_facet | Kim, Sung-Cheol Arun, Adith S. Ahsen, Mehmet Eren Vogel, Robert Stolovitzky, Gustavo |
author_sort | Kim, Sung-Cheol |
collection | PubMed |
description | Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi–Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi–Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier’s output probability to combine possibly very different classifiers. |
format | Online Article Text |
id | pubmed-8403970 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-84039702021-09-14 The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers Kim, Sung-Cheol Arun, Adith S. Ahsen, Mehmet Eren Vogel, Robert Stolovitzky, Gustavo Proc Natl Acad Sci U S A Physical Sciences Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi–Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi–Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier’s output probability to combine possibly very different classifiers. National Academy of Sciences 2021-08-24 2021-08-19 /pmc/articles/PMC8403970/ /pubmed/34413191 http://dx.doi.org/10.1073/pnas.2100761118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Physical Sciences Kim, Sung-Cheol Arun, Adith S. Ahsen, Mehmet Eren Vogel, Robert Stolovitzky, Gustavo The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers |
title | The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers |
title_full | The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers |
title_fullStr | The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers |
title_full_unstemmed | The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers |
title_short | The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers |
title_sort | fermi–dirac distribution provides a calibrated probabilistic output for binary classifiers |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8403970/ https://www.ncbi.nlm.nih.gov/pubmed/34413191 http://dx.doi.org/10.1073/pnas.2100761118 |
work_keys_str_mv | AT kimsungcheol thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT arunadiths thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT ahsenmehmeteren thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT vogelrobert thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT stolovitzkygustavo thefermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT kimsungcheol fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT arunadiths fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT ahsenmehmeteren fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT vogelrobert fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers AT stolovitzkygustavo fermidiracdistributionprovidesacalibratedprobabilisticoutputforbinaryclassifiers |