Cargando…

Solvable Model for the Linear Separability of Structured Data

Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure par...

Descripción completa

Detalles Bibliográficos
Autor principal:	Gherardi, Marco
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7999416/ https://www.ncbi.nlm.nih.gov/pubmed/33806454 http://dx.doi.org/10.3390/e23030305

_version_	1783670777498828800
author	Gherardi, Marco
author_facet	Gherardi, Marco
author_sort	Gherardi, Marco
collection	PubMed
description	Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure parameterized by interpretable quantities, and tractable analytically. Here, I address one class of models with these properties, and show how a combinatorial method allows for the computation, in a mean field approximation, of two useful descriptors of linear separability, one of which is closely related to the popular concept of storage capacity. I motivate the need for multiple metrics by quantifying linear separability in a simple synthetic data set with controlled correlations between the points and their labels, as well as in the benchmark data set MNIST, where the capacity alone paints an incomplete picture. The analytical results indicate a high degree of “universality”, or robustness with respect to the microscopic parameters controlling data structure.
format	Online Article Text
id	pubmed-7999416
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-79994162021-03-28 Solvable Model for the Linear Separability of Structured Data Gherardi, Marco Entropy (Basel) Article Linear separability, a core concept in supervised machine learning, refers to whether the labels of a data set can be captured by the simplest possible machine: a linear classifier. In order to quantify linear separability beyond this single bit of information, one needs models of data structure parameterized by interpretable quantities, and tractable analytically. Here, I address one class of models with these properties, and show how a combinatorial method allows for the computation, in a mean field approximation, of two useful descriptors of linear separability, one of which is closely related to the popular concept of storage capacity. I motivate the need for multiple metrics by quantifying linear separability in a simple synthetic data set with controlled correlations between the points and their labels, as well as in the benchmark data set MNIST, where the capacity alone paints an incomplete picture. The analytical results indicate a high degree of “universality”, or robustness with respect to the microscopic parameters controlling data structure. MDPI 2021-03-04 /pmc/articles/PMC7999416/ /pubmed/33806454 http://dx.doi.org/10.3390/e23030305 Text en © 2021 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Gherardi, Marco Solvable Model for the Linear Separability of Structured Data
title	Solvable Model for the Linear Separability of Structured Data
title_full	Solvable Model for the Linear Separability of Structured Data
title_fullStr	Solvable Model for the Linear Separability of Structured Data
title_full_unstemmed	Solvable Model for the Linear Separability of Structured Data
title_short	Solvable Model for the Linear Separability of Structured Data
title_sort	solvable model for the linear separability of structured data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7999416/ https://www.ncbi.nlm.nih.gov/pubmed/33806454 http://dx.doi.org/10.3390/e23030305
work_keys_str_mv	AT gherardimarco solvablemodelforthelinearseparabilityofstructureddata

Solvable Model for the Linear Separability of Structured Data

Ejemplares similares