Cargando…

Solvable Model for the Linear Separability of Structured Data

Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure par...

Descripción completa

Detalles Bibliográficos
Autor principal: Gherardi, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7999416/
https://www.ncbi.nlm.nih.gov/pubmed/33806454
http://dx.doi.org/10.3390/e23030305
_version_ 1783670777498828800
author Gherardi, Marco
author_facet Gherardi, Marco
author_sort Gherardi, Marco
collection PubMed
description Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure parameterized by interpretable quantities, and tractable analytically. Here, I address one class of models with these properties, and show how a combinatorial method allows for the computation, in a mean field approximation, of two useful descriptors of linear separability, one of which is closely related to the popular concept of storage capacity. I motivate the need for multiple metrics by quantifying linear separability in a simple synthetic data set with controlled correlations between the points and their labels, as well as in the benchmark data set MNIST, where the capacity alone paints an incomplete picture. The analytical results indicate a high degree of “universality”, or robustness with respect to the microscopic parameters controlling data structure.
format Online
Article
Text
id pubmed-7999416
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79994162021-03-28 Solvable Model for the Linear Separability of Structured Data Gherardi, Marco Entropy (Basel) Article Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure parameterized by interpretable quantities, and tractable analytically. Here, I address one class of models with these properties, and show how a combinatorial method allows for the computation, in a mean field approximation, of two useful descriptors of linear separability, one of which is closely related to the popular concept of storage capacity. I motivate the need for multiple metrics by quantifying linear separability in a simple synthetic data set with controlled correlations between the points and their labels, as well as in the benchmark data set MNIST, where the capacity alone paints an incomplete picture. The analytical results indicate a high degree of “universality”, or robustness with respect to the microscopic parameters controlling data structure. MDPI 2021-03-04 /pmc/articles/PMC7999416/ /pubmed/33806454 http://dx.doi.org/10.3390/e23030305 Text en © 2021 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Gherardi, Marco
Solvable Model for the Linear Separability of Structured Data
title Solvable Model for the Linear Separability of Structured Data
title_full Solvable Model for the Linear Separability of Structured Data
title_fullStr Solvable Model for the Linear Separability of Structured Data
title_full_unstemmed Solvable Model for the Linear Separability of Structured Data
title_short Solvable Model for the Linear Separability of Structured Data
title_sort solvable model for the linear separability of structured data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7999416/
https://www.ncbi.nlm.nih.gov/pubmed/33806454
http://dx.doi.org/10.3390/e23030305
work_keys_str_mv AT gherardimarco solvablemodelforthelinearseparabilityofstructureddata