Cargando…

The Fermi–Dirac distribution provides a calibrated probabilistic output for binary classifiers

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relati...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sung-Cheol, Arun, Adith S., Ahsen, Mehmet Eren, Vogel, Robert, Stolovitzky, Gustavo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8403970/
https://www.ncbi.nlm.nih.gov/pubmed/34413191
http://dx.doi.org/10.1073/pnas.2100761118
Descripción
Sumario:Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi–Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi–Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier’s output probability to combine possibly very different classifiers.